R e ^ O R T 



R e s u N e s 



ED 019 118 PS 000 036 

FINAL REPORT ON HEAD START EVALUATION AND RESEARCH--1966-67 
TO THE INSTITUTE FOR EDUCATIONAL DEVELOPMENT. SECTION II i ON 
THE INTERPRETATION OF MULTIVARIATE SYSTEMS. 

BY- LAND I KENNETH C. 

TEXAS UNIV., AUSTINfCHILD DEVELOP. EVAL. AND RES.CTR 

REPORT NUMBER IE0-S6-1 TUB DATE SI AUG 67 

EORS PRICE MF-S0.50 HC-SS.36 03P. 

DESCRIPTORS- ^RESEARCH METHODOLOGY i ^MATHEMATICAL MODELSi 
RESEARCH TOOLSi MATHEMATICAL APPLICATICNSi ^CRITICAL PATH 
METHOD, ^STATISTICAL ANALYSIS, ANALYSIS OF VARIANCE, 
CORRELATION, OPERATIONS RESEARCH, LINEAR PROGRAMING, 

THIS REPORT PRESENTS A DISCUSSION OF 2 TECHNIQUES WHICH 
CAN BE USED TO REPRESENT AND INTERPRET MULTIVARIATE 
STATISTICAL SYSTEMS WHEN IT IS FELT THAT THERE ARE CAUSAL 
RELATIONS BETWEEN SOME OF THE VARIACLES. THE BASIC TECHNIQUE 
IS PATH ANALYSIS AND THE OTHER IS ITS EXTENSION THROUGH THE 
USE OF RECURSIVE SYSTEMS OF EQUATIONS. THE ANAIPITSIS I 
RESTRICTED IN APPLICATION TO RELATIONSHIPS BETWEEN 
INTERVAL-MEASURABLE VARIABLES THAT ARE LINEAR, ADDITIVE, AND 
ASYMMETRIC. TO HAKE A PATH ANALYSIS, THE VARIABLES IN THE 
SYSTEM ARE CLASSIFIED AS EITHER EXOGENOUS, THAT IS, HAVING 
THEIR VALUES DETERMINED BY FACTORS OUTSIDE THE SYSTEM, OR 
ENDOGENOUS, THAT IS, HAVING THEIR VALUES DETERMINED BY 
FACTORS REPRESENTED BY VARIABLES WITHIN THE SYSTEM. BASED ON 
THIS ANALYSIS, A SET OF REGRESSION EQUATIONS REPRESENTING 
THESE RELATIONS IS FORMED. THIS SET IS TERMED THE PATH MODEL, 
AND GRAPHIC CONVENTIONS ARE GIVEN FOR DIAGRAMING IT. THE 
COEFFICIENTS IN THE EQUATIONS ARE SIMILAR TO THE CORRELATION 
COEFFICIENTS OCCURRING IN ORDINARY LEAST-SQUARES REGRESSION 
EQUATIONS. THE ADVANTAGE OF THE PATH ANALYSIS APPROACH IS 
THAT IT ENABLES THE EXPERIMENTER TO UTILIZE ALL THE 
INFORMATION AT HIS DISPOSAL, PARTICULARLY THAT CONCERNING 
CAUSAL RELATIONS BETWEEN VARIABLES. THE TECHNIQUE IS 
ILLUSTRATED WITH APPLICATIONS TO BIVARIATE AND MULTIVARIATE 
SYSTEMS HAVING SINGLE AND MULTIPLE STAGES OF CAUSAL 
INFLUENCE. SOME EXAMPLES DRAWN FROM ACTUAL RESEARCH PROJECTS 
ARE INCLUDED. (DR> 



o 

Uj 



FINAL REPORT ON 



HEAD START EVALUATION AND RESEARCH* 1966-67 
• (Contract No* 66-1) 

TO 

THE INSTITUTE FOR EDUCATIONAL DEVELOPMENT 



By 

The Staff and Study Directors 

CHILD DEVELOPMENT EVALUATION AND RESEARCH CENTER 
John Plerce-Jonea, Ph.D,, Director 
The University of Texas at Austin 

August 31 9 1967 




Section II* 



ON THE INTERPRETATION OF MULTIVARIATE SYSTEMS 

ty 

Kenneth C. Land 






PS 000826 



U. S. DEPARTMENT OF HEALTH. EDUCATION & WELFARE 
OFFICE OF EDUCATION 



THIS DOCUMENT HAS BEEN PEPROOUCED EXACTLY AS RECEIVED FROM THE 
PERSON OR ORGANIZATION OIcHiNATING IT. POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 
POSITION OR POLICY. 



FINAL REPORT ON 

HEAD START EVALUATION AND RESEARCH: 1966-67 
(Contract No, 66-1) 

TO y 



THE INSTITUTE FOR EDUCATIONAL DEVELOPWENT 



By 

The Staff and Study Directors 

CHILD DEVELOPMENT EVALUATION AND RESEARCH CENTER 
John Pierce- Jonea, Ph.D., Director 
The University of Texas at Austin 

August 31, 1967 



Section II: ON THE INTERPRETATION OF MULTIVARIATE SYSTEMS 





7REFACE 



This essay is written for research workers la the behavioral 
sciences* My assumptions regarding this Intended audience have several 
Implications for the characteristics of the paper* Firsts it is not 
Assumed that research workers have great proficiency in the logical 
manipulation of syiid>ol8* Hence» this is no place for advanced mathemat- 
ical and statistical niceties - rigorous, general, and aesthetically 
pleasing though they nay be* Furthermore, derivations are carried out 
in great detail and accompanied by extensive exposition* The aim of 
the paper, although it nay be contradictory, is to develop an informal, 
intuitive rationale of the material for the reader which, parallels the 
forstal, rigorous reasoning behind the topics* If this goal is attained^ 
then the researcher should be able to donfidently apply the methods to 
his own empirical problems* Finally, this paper is sysAolic of my faith 
that^ for at least certain areas of behavioral science inquiry, the 
relevant question is no longer **What variables are ^ortant?” but 
“How are the important variables related?*' It is my belief that the 
methods presented in this paper are appropriate to the latter question* 
On the other hand, straightforward application of statistical principles 
of estimation and tests of significance are probably more relevant to 

the former* 

It is with great pleasure that I acknowledge my indebtedness 
to Drs. John Pierce- Jones and Grover Cunningham of the Child Development 
Evaluation-Research Center for the time to do this research* Because 









mm 



of tho prossuro to produce "significant" empirical findings experiesoed 
In many behavlcra]^ science research centers^ the work directive to "be 
creative" methodologically Is all too Infrequent* Although I make no 
claim to origination of any of the notions In this papery their synthesis 
herein from diverse sources Is my response to the above-mentioned stl»* 
ulus* This implies^ of course^ that I am responsible for any errors In 
the presentation. 

Kenneth C* Land 







TABIJB OF OONIENTS 



SKcrriOH 

1. Path Analyaie*. .»••••. ••••••••••••••••••••••••••••••••••••*••• 2 

1.1. Path Mtodals and Path Oiafrana*.. .••.*. .».•••••••••••••• 5 

1.2* Path Coafflelanta and Path Ragrasalona 6 

1»S. Tha Blvarlata Path Modal... 8 

1.4. Tha Mnltivarlata Path Modal.. H 

1.5. An Empirical Ezanpla of tha Mnltivarlata Path Modal.... 22 

2. Bacuralva Sata of Slnultanaoua Equatlona....*.** •••.•*..•••••• 27 

5. Path Analjrali Kavlaltad.. .............. i». •••••••••••••••••♦•• 31 

5.1. Tha Multl«-8taga> Mnltivarlata Path Modal 31 

5.2. Tha Mnltl-ataga, Blvarlata Path Modal 39 

5.3. Tha Path Daconpoiltlon Modal............ 46 

5.4. Tha Basic Assumptions of Path Analysis and Modal 

Tasting........ 80 

ADDITIONAL RKADINOlt * .......i... 54 

APPENDIX I 

APPENDIX II..... AlO 

















i 



list of tabuss 



ZABUE 



PAGE 



1. Bstlmators of P«th Cooffleloatt for tho BlvorUtt Path 

MoAol 

2. Eotinatort of Path Ceofflelonto for tho Hultivorlmto 

Path Modal.... 12 

5. Eatlnotoro of Path Goafflelanta for tho Path Modal of 

Figura 7(a) ^ 

4. Batiaatora of Path Goafflelanta for tha Path Modal of 

Figura 7(h)... ^ 

5. Corralatlon Matrls for Logarithm of Varlahlaa In Duncan 'a 

Path Dacanpoaltloa Modal. ^7 









UST OF riGDBES 



FI60BE 

• 

2 XO 

^ 

5® ii# 

* 6 * 34 

7# •##•%••♦••••#•••#••••••••••••••••••••••••• •••••••••• ••• ••••••••• 4X 

a « 



ERJC 










ON TBE INTBRPNBT4TI0N OF lOILTIVARUTE SYSTEMS 

ICtiuMth C* Land 



This pap«r cenatltutes a ayataHoatle Introduction to two pro^» 
caduraa which hawa boon davalopad to aid tha rapraaantatlon and intar^ 
prapatlon of mltlwirlata atatlatlcal ayatana •* path analyala and raoor* 
alva ayatana of aquatlona* Tha flrat aactlcn daflnaa path nodala and 
path' dlagrana* It alao davalopa two alanantary appllcatlona of tha pro* 
eadura* In tha aacond; aactlon^ tha rapraaantatlon of atatlatloal aya* 
taam by racuralva aata of aquatlona la dlacuaaad* Finally^ tha third 
aoetlon of tha papar bullda on notlona of tha two praoadlnf aaotlona by 
aatandlng path analyala to hl^ly conplas ayatana of ralatlona. 

Tha author haa attanptad to provlda both a ayatannatle and, 
to aona astant, conplata dlacuaalon of tha toplca* Tharafora, two 
appandloaa ravlaw tha baalc nathanatlca of laaat»aquaraa eorralatlon 
and ragraaalon* If tha raadar haa difficulty undaratandlng tha naln 
body of tha papar bacauaa ha haa forgottan aona r>aalc atatlatloal no* 
tlona, ha nay raad tha appandloaa and than ratum to tha naln aactlona 
of tha aaaay. Furthamora, tha author haa attanptad to ahow that th$ 
notlona of path analyala, at laaat for tha ayataan dlacuaaad In thla 
papar, follow dlractly fron tha baalc atatlatloal notlona of laaat- 
aquaraa eorralatlon and ragraaalon» Finally, a naln foal of tiia pap* 
or la to daralop anou|h baalc undaratandlng on tha part of tha raadar 



'V ^ 












-d- 



that ha say procaad to utlllaa tha nathod In hit own araa of raaaareh* 
1harafora> althoui^ tha dlacuaalon and darlvatlona ara on an alaaan- 
tary laval^ thay ara aceonqpanlad by datallad azpoaltlon to provlda at 
laaat an intultlva and Infonal undaratandlng of what la raally folnt 
on» 



Sactlon l y /\|nj ^lTala > 



Tha nath^ ot path analyala or path eoafflelanta vaa daval- 

opad by tha fanatlclat SaMll Wrl^t in a aarlaa of t*naral assays 

(WKil^t^ 1921> 1954^ 1954^ 1960a^ 1960b) as an aid to tha quantlta- 

tlva daralopnant of ganatles* Wright statad tha prlaary purposa of 

tha oathod In his first fsnaral account (1921) as follows t 

Tha prasant papar Is an attaipt to prasant a na- 
thod of naasurlng tha dlract Inf luanca along aaoh 
saparata path In such a systaw and thths of find* 

Ing tha dagraa to which variation of q, glvan af» 
fact Is dataralnad by aaeh particular causa. Iha 
nathod dapands on tha conblnatlon of fcnowladga of 
tha dagraa of corralatlon anong tha varlablas In 
a systam with such knowladga as nay ba possassad 
of tha causal relations* In casas In which tha 
causal relations ara uncertain^ tha nathod can ba 
used to find tha logical consaquancas of any 
particular hypothesis In regard to than* 

Vrl|h^ alaboratad tha purposa of tha nathod In subsequent papers t 

.*• tha nathod of path coefficients Is not In- 
tended to accoippllsh tha Inposslbla task of de- 
ducing causal relations fron tha values of tha 
corralatlon caafflclants. (1934) ••• Path an^* 

• alysls Is an aztanslon of tha usual verbal In- 
terpretation of statistics not of tha statis- 
tics thsnsalvas. It Is usually easy to give a 
plausible Interpretation of any significant sta- 
tistic taken by Itself* Tha purposa of path a»* 
alysls Is to datamlna whether a proposed sat of 
Interpretations Is consistent throuihout* (1960b) 










s- 



So much for the Intentions of the method* Let ua begin by developing 
the basic notions of path analysis. From that pointy we shall develop 
a few simple^ almost trivial^ applications of the path notions* Fl»* 
ally^ we shall> after the Introduction of some additional notions In 
the next section of the paper^ proceed to the path analysis of complex 
systems of relations on variables In which the advantages of the pro* 
cedure begin to accumulate In sudi a manner as to make path analysis 
worth the effort of becosilng proficient In the method. 

1* 1* Path Models and Path Diagrams * We begin bv restricting 
the application of the method to sets of relationships among variables 
which are (l) linear^ (2) additive^ and (3) asymmetric. Furthermore^ 
the variables must be measurable or be conceived as measurable on an 
Interval scale^ althou£^ some of them may not actually be measured* We 
shall return to these assuoq>tlons at the end of the paper. 

In such systems of relationships^ a subset of the variables 
Is taken as linearly dependent on the remaining variables^ which are 
assumed to be Independent* That Is^ the total variation of the Inde* 
pendent variables Is assumed to be caused bv variables outside of the 
set under consideration. We may refer to such variables as "exo^nous*" 
Ihe exogenous variables In a particular set may be correlated among 
themselves; however^ the explanation of their Inte^rcorrelatlon Is not 
a problem for the system under consideration. The subset of variables 
which are taken as dependent variables In the total set may be termed 
**endogenoue” variables * In contrast to the exogenous subset of varl* 




ables^ the total variation of the endogenous variables Is assumed to 



t«B» Mot* that this lapliss that, in soma path modals, a subset of the 
endogenous variables ha conceived to have causal effects on other 
^ jdogenous variables in addition to the direct effects of the exogenous 
ynxiables* Purthennore, in those systems of relationships where an enr 
dogenous variable is not completely determined by prior (exogenous or 
endogenous) measured variables, a residual variable uncorrelated with 
the set of variables immediately determining the variable under con^ 

sideration is introduced to account for the variance of the dependent 

« 

variable not explained by measured variables* The basic assumptions of 
path analysis have been reviewed in these two paragraphs. Because they 
are so basic, the reader may find it useful to re-read the assumptions 

several tlises as the method is developed below. 

Th^ notion of the nath diagram was developed by Wright (1921, 

1934, 1960a) to provide a convenient representation of those systems 
of relations which conform to the assumptions of the above paragraphs. 
Path diagrams are drawn according to the following conventional 

(1) The postulated jtaus^ relations among the variables of 
the system are represented by uni-directional arrows extending from 
each determining variable to each variable dependent on it. 

(2) The postulated non- causal correlations between exogenous 
variables of the system are syndjolised by two-headed curvilinear arrows 

to distinguish them from causal arrows* 

( 5 ) Residual variables are also represented by uni-directional 




- 6 - 



arrows loading frm tho rosldual varlablo to tho dopondont Toriablo* 
Hotfover> litoral subscripts aro attaehod to rosldual syn^ls to indi- 
cate that thoso Ysriablos aro not naasurod* 

(4) Finally^ tho quantitios ontorod boaldo tho arrows on a 
path diagram aro the syiibollc or numerical values of tho path ai^ cor- 
relation coofficlonts of tho postulated relationships* Tho symbolic 
form of tho path coefficient is whore tho first subscript i de- 
notes tho dopondont variable and tho second subscript j denotes tho 
variable whoso dotoxmining influence is under consideration* Note 
that^ since wo aro considering only asyiaiietrlc causal rolations> tho 
coofficlonts and Pjj^ will never appear in the same path diagram 
together^ l*o*> either P^j Pj^r but never both will bo postulated in 
a given system* Furthermore^ the coefficient P^j will ordlnalriXy be 
a partial path coefficient | however^ wo do not denote the variables 
held constant after a dot as with ordinary loaeft-squavos partial so- 
gresslon and correlation coefficients* They will usually be obvious 
from the path diagram* 

In this papery we shall use the term path model to refer to 
the regression equation or set of regression equations which repre- 
sents the postulated causal and noiHcausal relationships Smengc:’:.. the 
variables under consideration* A propert y of path diagrams which con- 
form to the above rules of representation an Isomorphism with the 
algebraic and statistical properties of the postulated system of re- 
lationships* In other words> there is a one-to-one correspondence be- 
tween postulated causal and non- cau s al relations of a path model 



o 

ERIC 






•e- 




and its path dlagrsBu Oils proparty and its usafulnass will baeoBS aora 
obvious as va davalop tha nathod* As an illustration of tha convantions 
of path diagrans^ wa hava a possibla ays tarn in Figura 1. 

1>2> Path Coafficiants and Path Rasrassions > Sinea all rala» 
tions are assumad to ba linear^ va nay write tha dependant relationship 
of on X 2 > X 3 >««*^Xq^ and residual in rav^seora fom as follows t 

Xi - Ci^ + CislC, + ... + (1.1) 
or^ in daviatioi^units^ we have 

where ia the naan of tha ith variable^ Letting X£ « (Xi-Mi), this is 
*1 • cxg«2 + «1S*3 + + ®liAi ®la*a (1*2) 



- 7 - 



It is often mors convenient to utilise each variable in standard init 
form. Let and where denotes the 
standard-deviation of the ith variable* Then formula (l*2) becomes 



the dependent variable while the second subscript denotes the indepen- 
dent variable/ are of the type of partial regression coefficients but 
iway exist in a system with unmeasured hypothetical variables in addition 

f 

to the residual variable. Hence/ we shall refer to them as path regres - 
sioi^ co efficie nts* Also/ the standardised coefficients Pi2> etc./ are 
of the type called oath coefficients or standardised jpath coefficients * 
Each path coefficient measures the fraction of the standard deviation of 
the dependent variable (with the appropriate sign) for which the desig- 
nated variable is directly responsible in the sense of the fraction which 
would be found if this factor varies to the same extent as in the ob- 
served data while all other variables (including residual factors) are 
constant (Wri^t/ 1934)* This definition (except for determination of 
sign) can be written as follows t 



Zi ■ J?12^2 **■ ^13^5 + ••• + ^In^n + ^la^a 



(1.5) 



The coefficients etc*/ where the first subscript denotes 



^12 " ^l»23*»*n/a • 



^2^ 34* • • U/ a 




®l*?3*.*n/a 



^2*34..*n/a 



®12 



(1.4) 



ERLC 




- 8 - 



whare Indicates the standard deviation of the ith variable 

i* Jk« « *n^a 

with variables J through n and residual variable a held constant* Given . 

this definition of the path coefficient^ it is obvious that the squared 

of the variance 

oath coefficient measures the portiondpf the dependent variable for which 
the independent variable is directly responsible* 

Ihjw that we have introduced the definition of path coefficients 
path regressions^ the alert reader will Inmediately note a similarity 
of pi*ch regressions to ordinary least-squares partial regression coeffi- 
cients and a similarity of path coefficients to least- squares standardised 
partial regression coefficients or beta weights as discussed in the two 
appendices, of this paper* It is true that for certain kinds of systems 
of relationships the path coefflci^-its and regressions are identical to 
the least- squares estimators* However j this is not true in general (see 
Wright, 1934, 1954, 1960a)* It will be our task to develop the method 
of path coefficients only. , fer tfadse? cases in whtph path coefficients and 
regressions are identioal to the least-squares estimators for correlation 
and regression coefficients in the remaiuder of this paper* We shall find 
that path analysis yields information about a statistical system which 
helps render an interpretation possible* It does so, not by additional 
statistical analysis, but primarily by forcing the researcher to utilise 
all of the informatio:. at his disposal* We proceed to develop the pro- 
cedure for elementary applications* 

1*5* The Bivariate Path Mbdel* The simplest type of relation 
to which path analysis may be applied is the ease of a dependent variable 









X 2 > Independent or easogenoue variable and refldual variable! X^ and X^i 

*2 • C21X1 + C2b*b 

£n standard*’ £om^ the path siodel is 

Zg - T!si\ + 

Thl« Is simply a case of blvartata Uaat-aquaraa ragcaaalon with axpllelt 
consideration of the residual tarsu Flgura 2 la a path dlacram for this 
s»dal. Beta that, since Zi Is considered esoganous, Pj, - 1.0, l.e., the 
total variation of Is caused by unneasured varlabUs or varUblea outside 

t 

the present modelo Because this la true for all exogenous variables, syi^ 
bole for their residuals and residual path coefficients are often dropped 
from the diagram In the Interest of neatness of representation. 

In this model, as shoim in Table I, the least-squares estimator 
for the path cefflclent ? 2 x l« the correlation coefficient r 2 i vhlch Is 
also equal to the beta weight in the bivariate Mse as Is shown In the 
first append!:: of this paper (section A.I.I.)* Here, as In the appendices, 
lie symbolise least-squares standardised partial regression coefficients 
or beta wel^ts by since we do not have a Greek letter for beta on our 
fc^ykoard. Now, let us consider the estimate and meaning of the residual 
path coefficient* Since Is Independent of Zj (it was stated earlier 
that the residual Is independent of the iinnedlate predictors In a model 
and this also follows from least-squares estimation principles), we know 
by the same principle as for that Jf2b “ ^ “ ^2b* since the 

residual represents all variables outside the system which cause variation 
In Z 2 > It Is unmeasured and we do not have a direct estimate for P2b* 





FIGUBE 2 » 



1 1 RATny ^ TORS OF PATH QOEFFlCttWTS FOR THE BIVARIATE PATH MP]^ 

(1) Path Modalt Zq • ^2lH **■ ^2b^b 

( 2 ) ^21 ” ’^21 " 

(s) ^22 - 1 * •*• 

(*) ^2. - Vi - 4 ■ - 4 



Tharafora^ wa must aatlmata It Indirectly by utilising the fact that the 
squares of ?2^ and Pg^ must sum to unity* That i»j since the square- of 
ea^ of P23^ P2|j i® the proportion of the variance of Z2 eatplained by 
Zj and Z^j, respectively, and the variables are Independent, the sum of the 
proportions must be unity. Hence, Pgj, is the square root of the quantity - 
one minus p|j^ or one minus r|^ - as is shown in Table 1 . This looks famil- 
iar* It is in fact what we define as the coefficient of alidnation in 
the first appendix of the paper (section A* 1 * 2 *)* Now we see the first 
contribution of path analysis to the inter pretation of regression sys- 
temst It provides a convenient and logically sound interpretation of 
the coefficient of alienation as the path coefficient of the residual 
term in the regression equation. It may be shown that the residual varUble 



o 




has a laro ntan and unit variance or standard deviation since ve have con- 
ceived of all of the variables as being in standard-fomu Hencei it nay 
be helpful for the reader to think of the residual as a dunny variable 
having unit variance and aero mean and representing all unmeasured vari* 
ables which cause variation in the dependent variable* The residual oath 
coefficient* then^ is the proportion of the standard deviation* and j^ts 
square is the nronortton of the variance* of the dependent variable ahich 
is caused by all (unmeasured) variables outside of the set under consider^ 
ation in the path model* It should be noted that this interpretation of 
the residual path coefficient is consistent with the derivations in Ap» 
pendiz I of the paper (equations Al*9> Al*10> and Al*ll)» Hcwever^ the 
hard*nosed reader who does not trust the above logic may easily demonp 
strata the relation in a specific empirical case. If he has a computer 
regression program availablsi he ma^ simply predict a criterion variable 
from a predictor variable^ print out the residual variablsi and thenpre»> 
diet the criterion from the residual* The beta weight in the second pre» 
diction should be conparable to the square root of one minus r-square in 
the first prediction* 

1*4* The Multivariate Path Model . We shall now extend path 
analysis to the multivariate case* Consider a path model in which an 
endogenous variable is dependent upon exogenous variables and Z^ 
and a residual variable Z^t 

^5 ■ ^51^1 **■ ^32^2 *’* ^Sa^a 

A path diagram of this model is given in Figure Z, and the estimators 






12- 




2t BSTIHiTORS OF PATH COEFFICDSMTS FOR THE MULTIVARIATE PATH MODEL 

OF FIGURE 3» - 



StandmrdlEtd Path Co#fflcitnti 
(1) P*th M6d«l for Flguro St Z3 • ^5x^1 ^52^2 ^St*« 



(•■?} 




“ '51 


^32 .2 




(S) 


’^52 


“ ^31 '12 


+ 

Cli 

to 




(4) 


^^12 


given 




2 


(6) 


’^SS 


« 1 « P 


Sl'51 * 


^32^S2 *** ^3e 


(6) 




- 1 - P 


si'si ■'■ 


'S2'S2 " ^ * 


(7) 


^5« 


- Ml - 


r2 






(8) 


“51 


■ 'si + 


®32’*21 






cm" 

to 


” ®si’*12 


+ 

0 

CH 

to 




(10) 


'*2X 


end ^X2 


given 






15 - 



for th« path coafficiants and path ragraaslona ara shown In Tabla 2* 

Thara are a nund^ar of Important principles to ha gained from 
an analysis of Figure 5 and Table 2« First, let us derive. the relation 
of the standardised path coefficients and the path regression co- 
efficients C|^j to the least- squares correlation coefficients r^^j and 
regression coefficients b^^j* For example^ take the ease of the path 
coefficient hv We are given 



*51 ■ ^31 ^ 32*12 

so^ by subs tree ting ^ 3 ^x 2 both sides we have 

" 'sa'ia 

By the same principle^ ■ r^g » ^ 5 x^ 12 * substitution 

®Si " 'si * ('sa ■ ^sx'iaJ'ia 

■ 'si " 'sa'ia ^si'ia'ia 

2 * 

Then^ by subtracting Pgirx2 both sides of the equation^ the equation 
beci^s 



^31 * ^ bl ’^12 ^^31 " *^ 32^*12 

or^ by factoring out P^^ on the left-hand side^ 

*’si - 'la ) ■ 'si ■ 'sa'ia 

/ 2 V 

whence^ by dividing both sides by (1 - r^^g ), we have 

'si ■ 'si ■ 'sa'ia 




(1.7) 




In this form> P 3 X bears immediate similarity to formula (A2.2) and is^ in 
fact> identical with the least-squares estimator for the standardised par- 
tial regression coefficient. We may derive a similar formula for the path 




- 14 - 



rtgrcssloxL cosfflclttiitii Cj^j« Iht derlvfttloii for It at followtt 



^31 


m 


®5l ®S2’*2l 




(1.8) 


®31 


- 


’*51 * ®52’*21 








m 


^31 ’ (^*32 “ ®31^12) ^21 






- 


’*51 ■ ’’52**21 


®31^1^21 




«31 


- 


«51’*12**21 ■ ’*51 ■ 


^32^21 




C31 


(1 - 


^1^21 ) “ ’*51 * 


^32^21 




SOj 










“SI 


- 


’*51 * **52'*21 




(1.9) 






1 - hl^ai 






Again, In this 


form, Is identical 


to the least- squares estimator 


for 



partial ragraaaloR coafflclant in raw- or davlatlon-acora fom at glvan In 
Apptndlx formula (A2«5)» Tht above two darivatlona demonotrate that for 
the multivariate path model , of which Figure 5 It a apeelel catt| t^ 
oath coefflclenta and oath rearettlont are equivalent to the leat t-tquaret 
jUtjynatoM for the atandardlied and raw- or devlutlon-ecore partial re- 

grettlon coefflclenta^ reapectlvely. 

A tecond principle that emergea from Figure 5 and Table 2 la 
llluatrated by formulae (2) and (5) of Table 2. We know that we are giv- 
en the correlatlona of the exogenous varlablee In any path model (ri2 
Figure 5) a nd we are not Interested in them* But fonmilas (2) and (3) 
show the basic composition of the correlation of each exogenous variable 
with the endogenous variable* Let us explore the derivation of> sayi 
formula (3). By definition of the correlation coefficient for variables 



o 

ERIC 



in standard- score £onxi| we have 



•iSh 



rji - (1/H) £ ZgZ^ 



( 1 . 10 ) 



where the sunmatlon Is over all observed Z-values for variables 1 and 3* 
The convention regarding snmnations followed in this paper is that^ if 
a suanation index ie not provided^ tdie summation is over observed values ( 
otherwise^ a sumnation index and explanatory note will be provided* Re- 
garding (l.lO), we know that - the standard- score form of the criterion 
variable - is totally dependent upon Zj^, Zg> and Z^. Hence^ by substitu- 
tion from formulae 1) of Table 2 - the path model, we have 



and since (l) the sum of the squares of standard- scores for a variable is 

unity, (2) the sxim of cross-products of standard- scores for two variables 

is the correlation coefficient of the variables, and (3) the correlation 

of the residual Z with an immediate determining variable of Z« is sero, 

a ** 

(1*12) reduces to 



rgi - (1/M) £ Zi (PgiZi + PggZa + Pg^Za) 



( 1 . 11 ) 



or, by expansion. 



e (1 /n) (^51 ^ ^ 1^1 ^32 ^ ^1^2 *** ^3a ^ ^l^a^ 

^ + P32 ZZjZg + ZZ^Z^ 




2 



( 1 - 12 ) 



N 



N 



N 



'31 " ^31 ^^ 12^32 



(1.13) 



and we have derived equation (2) of Table 2* 



There are several insights to be derived from (1*13)* £Lrst, 
it implies that, for a multivariate path models the correlation of an 



- 16 - 



•xoKtnottS variable and the dependent variable is t he am of the direct 
•fftct via its path coefficient from that exogenous variable to the de- 
pendent variable and ita indirect effect (a) throu^ its correlation with 
the other exogenous variable(s) as measured by the product of the correlation 
coefficient of the two exogenous variables and the path coefficient of 
the latter variable^ T^ia ia a second contributio n of path analysis to 
the interpretation of regression svs terns i It provides an interpretation 
of the correlation of a predictor and the criterion as a sum of direct 
and indirect effects. This interpretation is certainly not obvious from 
the typical fonmilasfor the correlation and beta coefficients^ Of 
course^ a similar type of interpretation holds for formulas (8) and (9) 
of Table 2 regarding total and partial regression coefficients in raw* 
or deviation-score form. 

Let us look agftiw at formula (1.13). the total correlation 
between the exogenous variable and the endogenous variable Zij is made 
up of a sum of direct and indirect effects^ and jS the direct effect is 
estimated by 23^# then the indirect effect must be estimated by ^1^32^ 
or in a more generally applicable formt 

Total Indirect Effect (TIE) * . 

of Zi on Z3 • - P5I 

Thus^ wejhave an exanq|>le of jjie third contributi on of path analysis to 
the internretation of regression systems i It provides a general proce- 
dure for exploring the Indirect effects of an Independent variable on a 
dependent variable in a multivariate path model. This contribution will 
become more interesting as the path models become more complex. 



- 17 - 






Equation (5) of Table 2 nay be viewed as a special ease of (2) 
and (S) - the case of complete determination of the dependent variable. 

Let us explore the derivation of this equation. We know^ as before that 
the formula for the correlation of Z3 with itself ist 

rgg ■ 1 • (i/n) S Z 3Z3 

By substitution of formula (l) of Table 2, this becomes 

tss - I • ( 1 /N) 2 Z 3 (^3XZx + ^32^2 + ^3.*.) 

- (X/h) (Pgi IZgZi + Pjg SZ3Z2 + 

- P31 Z Z3Z1 + ;Pj2 Z ZgZg + Pg, 2 Zs*a 

B S M 



ERIC 



For the same reasons as given above for the derivation of (1.15) and the 
fact that^ since Z^ is independent of and Zg, r^^ ■ this becomes 






**■ ^ 3^32 



+ 'P 



3a 



(1.X6) 



i-1 



^3i'3i 



+ P. 



3a 



which is (5) of Table 2. This equation may be further expanded by the 
substitution of values for r3^ and r^p as follows 



T33 



1 - P3x('3X+'s^12) + P32('3X'x2^^32) + 

- ^X ^3XP32*X2 ^32^3X'X2 * ^ ^ 



"3X 
2 2 

2 Pit + 2 

1>X 



3a 



where the range of J and (k>j)> includes all measured variables. 







• 18 » 



let ue examine the Inoltcatlona of eonationa and 

First, note that the single aumnation in (l»I6) ia equal to the sum of 
the two suBmations in (l*17)« Second, note that (L17) ia identical to 
equation (A2.13) of Appendix Two except for the explicit consideration of 
the residual path coefficient squared - 4* This iiiq;>lie8 that the sum 
of the two aumnations in (l*17)> and^ therefore^ the single aumnation in 
(l*16)> is equal to - the square of the multiple correlation eo» 
efficient for the path model. This means that we can derive a simple 
formula for the confutation of the residual path coefficient as follows^ 

*’ 31*31 

4 + 2 (K>J) 



Pjg - ' ( 1 . 18 ) 

A third implicat ion of (1. 17 j concerns the important problem 
of the interpretation of the path coefficients in the multivariate path 
model > Equation (1*17) shows that the total variance of Zg is a sum of 
the squares of each path coefficient plus a term whidi measures the cor* 
relational influence of the exogenous variables. Or^ in other vords^ by 

p 

multiplying both sides of (1*17) by it may be seen that the squared 

2 

path coefficients measure the portions of that are determined directly 
by the exogenous variables while the other sumnation (which may be nega* 
tive) measures correlational determination. Now we come to a unique 



Soj 



^5a 



1 

1 

1 



2 

Z 

1«1 

2 

Z 

1-1 



R* 




characteristic of multivariate path coefficients or beta weights as op- 
posed to their bivariate counterparts! Whereas bivariate path of beta 
coefficients^ since they are identical with the correlation coefficient^ 
are bounded by + and - 1»0> the multivariate path coefficients may ex- 
ceed + 1 or - 1 in absolute value; hence^ the square of a multivariate 
path coefficient may exceed + !• Ihis would seem to indicate that the 
independent variable with a path coefficient- squared greater that 1.0 
causes more than 100 per cent of the variance in the dependent variable. 
But this is impossible. So the question Isf How does one interpret a 
squared multivariate path coefficient greater that 1.0? Wright (1960a) 
gives the following interpretation! 

Such a value shows at a glance that direct action 
of the factor in question is tending to bring a- 
bout greater variability than is actually observed. 

The direct effect must be offset by opposing cor- 
related effects of other factors. 

In shorty the key to this difficulty would seem ^to lie in the defini- 
tion of the path coefficient as given in equation (l.4) and in rela- 
tion (I.17)t M a multivariate path coefficient is greater than + or 
. then one or the other of the terms in (l*4) must be greater that 
H- or - 1. Purthermore, the cr f relation of the exogenous variable with 
the other exogenous variable(s) as in equation (1.17) must be such as 
to compensate for the tendency of the exogenous variable to cause more 
variation in the dependent variable than is observed in the data of 
concerxu la a specific empirical exanqple, it may be fruitful to ex- 
amine ea^ of the specific terms of the second summation in (l*17) to 
provide insight as to how the correlation of the independent variable 
of Interest with the other exogenous variables is compensating for its 



20 - 



large direct path coefficient. This exploration nay lead to lnal|ht> for 
exasqile^ as to how the path model could be altered to achieve the same 
multiple correlation with fewer exoge'ious variables* As for the residual 
path coefficient^ Its Interpretation remains the same as for the bivariate 
path model. 

For the reader who Is weary of this abstract discussion of 
multivariate path models> we shall give an esqplrlcal example* However^ 
before we leave this toplc^ let us generalise several Important fonmilas 
to an n variable multivariate model* Ihe general multivariate path model 
Is 

h “ ^12^2 Vs + ••• + Vn Va 

where Is arbitrarily taken as representing the endogenous variable^ 

^2* ••*/ as exogenous variables^ and as residual* The general for- 
mula for the correlation of any exogenous variable with the endogenous 
variable becomes 

'll “ ^11 A 'ij 

i^i 

- Z Pi. r., (1*20) 

>2 

Then the formula for the total Indirect effect of any exogenous variable 
Z^ on the endogenous variable la 

Total Indirect Effect 

(tie) of Zj^ on Zj^ » r^j^ - (l*2l) 

Also^ the formula for the complete determination of Zi becomes 



o 



m21«* 



11 



E P. 
1-1 



11 



^ 2 
+ 2 E _ Pi4P„r^,: + 



k,j-i y Ik jk 



( 1 . 28 ) 



Finally, <•»»« for tha rasldual In the ganeral multlvarlata path 

model Is 



la 



V 



1 - R‘ 



(1-23) 



Figure 4 Illustrates the general multivariate pat h diagram. 




o 



FIGURE 4. 



- 22 - 



1. 5. An Empirical Example of the Multivariate Path Model . This 
ewpirlcal problem Involves data kindly provided by Dr* Grover Cunningham 
of the Child Development Evaluation- Research Center. He postulated the 
dependency of a child's IQ score on 17 measures of correlation of per- 
sonality characteristics of his parents. Ihe exact form of his path 
model Is 

Zie - ^18,1*1 + '16,222 + '18,523 + '18,424 + 'ie,5?5 

+ ^18,626 '16,727 ' i 8,828 '18,929 

-!• Jfl8,102l0 + 'i8,1i2i 1 + 'l8,122l2 + 'l9,132l5 
+ 'i8,u2i4 + '18,152.15 + 'i8,162i 6 + 'l8,172x7 

+ 'l8,a2a (1.24) 

where 1 « 1>..*;17 are measures of correlation of personality attri- 
butes of parents, Zie is the child's IQ score, and Is the residual 
factor. A path diagram of the postulated model Is given In Figure 5. 

All of the postulated causal relations are drawn In Figure 5. How- 
ever, note that of the postulated correlational relationships 

are drown In the figure. The reason for this departure from the norm 
Is that, as the nuaiber of exogenous variables Increases In a multivar- 
iate path model, the number of correlational relationships between the 
exogenous variables Increases according to the formula for the binomial 

coefflclentt 



( 






(1.25) 



where n Is the total number of exogenous variables, k Is the number 



o 



0f Yaxiablss taktn In a coniblnatlon • this will naarly alwaya ba 2 for 
nultivarlata path models^ and • la tha factorial oparator* Hanca^ tha 
miadbar of corralatlons batwaan tha 17 axoganoua varlablaa takan 2 at a 
tlma would bai 



( 




17i 

21 15i 






272 

2 



156 eorralatlona 



In ahort^ tha rapreaantatlon of all 136 eorralatlona among the axoganoua 

’/ 

varlablaa would maka Figure 5 look mora Ilka aba tract art that a path 
diagram. Therefore, wa ahall laava correlational arrowa out of Figure 5 
and choose to keep In mind the poalted Intercorrelatlona among the axo- 
ganoua varlablaa. 

Tha numerical rather than the aymbollc valuee for the path co- 
efficients have been entered In Figure 5. From these values, It la 
obvious that exogenous variables 10, 12, and 16 make the largest direct 
contributions to the variance of 2^3. However, the residual variable 
Z still accounts for about 42 per cent of the variance of children* a 

ft 

IQ scores. Tha aquation for total datermlnatlon of 2^3 with the actual 
numerical values for the direct effects, correlational effects, and 
residual effects corresponding to equation (1.25) Is 

- r^o 10 • 1*0 « 1.2654 - .6869 + .4215 (1-26) 



Thla equation shows that the correlational effect la negative, quite 



Xarg«^ and eouataraets tha ovardacarnlnatlon e£ tha IQ aeoraa threu|^ tha 
dlraot affaeta of tha aavantaan axeganoua varlablaa* 

Lat ua loak at tha total Indlraat affaeta of each of tha var- 

lablaat 



TIE of 1 bn 16 


m 




^18,1 


- 


.155 


- .146 


- .005 


TIE of 2 on 18 


m 


'18,2 ■ 


^18,2 


- 


••089 


- (-.145) 


- .054 


TIE of 3 on 18 


m 


'18,5 ■ 


’^18,5 


- 


-.021 


- (-.255) 


- .214 


TIE of 4 on 18 


m 


'18,4 ■ 


“^18,4 


- 


] 

• 

o 


- .117 


- -.126 


TIE of 5 on 18 


m 


'18,5 ■ 


^16,5 


- 


.102 


- .117 


> -.015 


TIE of 6 on 18 


m 


'W,6 


'l6,6 


- 


-.227 


- (-.221) 


- -.006 


TIE of 7 on 18 


m 


'18,7 - 


'l8,7 


• 


-.105 


- (-.240) 


- .155 


TIE of 8 on 18 


m 


'18,8 ■ 


^18, 8 


m 


-.066 


- (-.249) 


- .181 


TIE of 9 on It 


m 


'18,9 • 


^18, 9 


- 


-.158 


- .200 


- .081 


TIE of 10 on 16 


m 


'18,10 ” 




- 


.068 


- .U1 


- -.575 


TIE of 11 on 16 


m 


'l8,ll * 


^18, 11 


- 


-.'086 


- (..196) 


- .108 


TU of 12 on 16 


m 


'18,12 * 


^18, 12 


- 


"■a 

0 

• 

1 


- (-.474) 


- .400 


TIE of IS on 18 


m 


'18,15 ■ 


^16,15 


- 


.120 


- .164 


- ..OU 


TIE of 14 on 16 


m 


'18,14 * 


^18,14 


- 


• 219 


- .559 


- -.120 


TIE of 15 on 18 


m 


'l8,16 " 


^18,16 


- 


• 161 


- .207 


• -.048 


TIB of 16 on 18 


m 


'18,16 * 


^18,16 


- 


-.260 


(-.525) 


- .265 


TIB of 17 on 18 


m 


'18,17 ‘ 


^18,17 


- 


.095 


- .128 


■ -.055 



lhara ara tiro obaarvatlona which ahould ba oada ^•narding tha 
aatlMtaa of tha total indlract affact of aach azoganoua varlabla. Flrat 



nota that, althou^ tha variablaa with larga dlract affaeta tend alao to 



26 - 



hav« large indirect effects^ the rank-order of the variablea by aiie of 
indirect effects is quite unlike the rank-order according to aise of 
direct effects. Second* note that^ for the three-variable wltivariate 
path model discussed in section 1.4^ the total indirect effect was the 
only indirect effect for each exogenous variable^ s*g*^ variable one had 
only ana indirect effect - throu^ its correlation with variable two and 
the direct effect of variable two on the en4ogenous variable. However^ 
in this problem with seventeen exogenous variables^ each exogenous var- 
iable has sixteen different indirect effects^ 1. e . , each exogenous var- 

e 

iable has an indirect effect throu^ its correlation with each of the 
other exogenous variables. In sensral. if there are n exogenous var- 
iables in a multivariate path models then there will be n - 1 indirect 
effects for each exogenous variable. This result follows directly from 
formula (1.20). As an example; in the present model the indirect ef- 
fect of variable 1 on variable 16 throu^ variable 2 is the product 
ri2 • Pl8;2 “ (**14S) ■ 0.036 while its indirect effect 

through variable 5 is the product • Px8,3 “ (-*170) (-.235) ■ 0*040. 
Of course^ the sum of all of the Indirect effects of an exogenous variable 
throu^ all of the other exogenous variables is equal to the total in- 
direct effect of the variable. As an aid to the interpretation of a 
specific empirical problem^ it may be useful to examine the separate 
indirect effects of each variable* It is out of such painstaking em- 
pirical examinations that a multivariate behavioral science will be 
built.' 




27 



Secttcn 2t RBClirslv Sets of Slntoltaneous Equations * 

Up to this point in the paper^ we have developed path models 
for elementary multivariate systems* By "elementary" ve mean systems 
of variables such that the postulated relationship is of an endogenous 
variable dependent upon a number of exogenous variables which are taken 
to be caused by variables outside of the set under consideration* The 
reader who has systematically read the preceding section should be able 
to construct and Interpret a multivariate path model for any elementary 
multivariate system he may encounter* However^ the elementary multivar- 
iate path model is not sufficient for all of the types of multivariate 
systmns which the behavioral scientists frequently must analyse* Spe- 
eifically> we often are willing to postulate that an exogenous varia- 
])le effects an endogenous throu^ its direct effect on another variable* 
Or^ in other situations^ we are willing to postulate the causal depen- 
dency of whnt we had considered an exogenous variable* This type of 
multivariate system is illustrated by Figure 1 of section 1*1* In shorty 
we often wish to isolate *'stages” of causation* We shall take a brief 
excursion from path analysis in this section co develop a tool which 
will allow us to represent such multivariate systems with path models* 

Obviously^ the problem posed in the preceding paragraph is 
the simultaneous representation of several relationships among a set of 
variables rather than one particular relationship taken by Itself. Fur- 
thermore> the reader has undoubtedly been exposed to the dictum that in 
order to represent several relationships at the sane time one must write 




-2S- 



and solve a set of simultaneous equations. The question ist which of the 
many possible sets of simultaneous equations are consistent with our as- 
sumption of asymnetrical causal ordering and allow a simple least-squares 
solutiod? Let us examine the following simultaneous equations in which 
we have represented eadi of standard-units) as a de- 

pendent variable with residuals Za^ Zj,, 2|^t 

Zi - ^x2?2 **■ ^13^3 **“ ^In^n **■ ^la^a 

^2 " ^21^1 *** ^23^3 *** ^2n^n ^2b^ 

. ( 2 . 1 ) 

s 

2n ■ \ih + ^1^2* - + + Wk 

This type of simultaneous equation structure will not suffice for our 
purposes^ because it does not meet the turo conditions stated above. 

Firsts it does not represent an asymmetrical causal system since it in- 
cludes both the path coefficients and Pjx, for each i and J. Se- 
cond^ unless some of the path coefficients are set equal to sero^ there 
is no set of values which yields a unique solution for the path coeffi- 
cients^ and w* jst-squares procedures cannot be utilised to solve the 
system (Blalock^ pp« 53-54)* 

We can solve both of the above difficulties by adjusting the 
system so as not to permit two-way causation. This implies that^ if 
we allow for the possibility that P^j ^ 0, the Pj^ 
equations (2.1), let us set each P^j equal to sero if J > i. This 




- 29 - 



condltlon gives the following type of slnultaneoue structure which Is 
called a recursive system of simultane ous equations t 

h - Wa 

Zg - Pgl^l *** Wh 

^3 ■ ^31^1 *** ^32?2 ^3c^c 

( 2 . 2 ) 

\ *• 'nl*l W2 + — + * Wk 

There are several Important properties of recursive equations 
as Illustrated hy (2.2). First, note that Z^ Is taken to be caused only 
by variables that are outside of the set under consideration. Hence, 
corresponds to what we have called an exogenous variable. However, Zg Is 
causally dependent on Z^ as well as a residual vgrlable. Purthermore, Z^ 
Is causally dependent on both Z^ and Zg and a residual variable. Finally, 
Z^ la dependent on all of the other Z^^ and a residual variable. A ••conj 
oronertv of recursive systems Is thac any of the remaining path 
coefficients may be set equal to sero If It does not reflect a postulated 
causal dependenisy* example. If we set Fg^^ - 0, then we have a re- 
presentation of a multivariate system in which there are two exogenous 
variables - Zj and Zg. Finally, perhaps the most important property of 
recursive systams of regression equations Is that we can make use of or- 
least-squares procedures to estimate the postulated path coeffi- 
cients (Wold and Jureen, pp. 51-52). 



o 



Let u8 now try to grasp some intuitive understanding of whet 
the term "recursive" means when applied to a system of equations and build 
a rationale for an assumption about recursive equations which we will find 
convenient to make* Firsts note that^ if we enter a recursive set of 
equations to determine the value of^ say, Zy we will find it is depen- 
dent on the values of all for i < j* If we proceed to inquire into 
the determination of the values of each of these in turn^ we will find 
that they are dependent on the variables which preceded them in the ays* 
tern until at last we come to the variable (s) the value of which is sim- 
ply taken as given or observed and caused by no explicitly considered 
variable* Ihus^ the statement that a system of equations is recursive 
means that there is at least one variable the value of which is not in 
question and which successively enters into the determination of every 
other variable in the system either directly or indirectly throu^ the 
determination of an intermediate variable. Nov consider what happens 
in equation (2*2) if an increase in the value of Z. if^ associated with 
an increase in the value of Z^, i»a», if there is a positive correlation 
between "all other" variables which cause variation in Z2 end those which 
cause variation in Zy Then> as Z^ ( » Z^) increases^ the value of Z^ 
will also increase* lliis will cause the estimate for to be spur- 
iously high in order to compensate for the confounding effect of Z^* 

Similar types of side-effects can be traced throu^ equations (2*2) if 
avcy of the residual terms are correlated* Hence^ in order t6 assure 
that we get unbiased estimates of the path coefficients in path models 
involving recursive systems of equations^ we shall have to assume that 



- 51 - 



the rtsldual f ms in tach of the equations are un corrtlaf d. Practi- 
cally, this naans that, In a specific empirical problem, we shall want 
to brlns es many as possible of the common components of the residuals 
explicitly Into the path model so that the residuals will have or approx- 
jjimte xero correlation# Furthermore, we sha*l find that we can test 
this assumption In certain models* 

Section 3i Path A qalyala Revisited * 

S.1* The tbiltl«>stap^e« Multivariate Path Model* Consider again 
the problem posed at the beginning of Section 2* Briefly, the type of 
system we would like to handle concerns "stages" or "chains" of causa- 
tion. The ndtlon of recursive systems of equations provides a tool for 
the development of a general type of model for such multivariate systems 
which we shall term the multi-stage, multivariate path model * Because 
there are so many possible specific uses of thlsjoodel. It Is virtually 
Impossible to discuss It In general* Therefore, we shall utilise an 
example from a sociological problem posed to the author by David C. Eaten. 
Eaten was concerned with the explanation of the personal Income of heads 
of households In the United States In the year 1959 by a limited number 
of personal characteristics of the heads* From a search of relevant 
literature, he was able to find a nuad>er of bivariate correlations among 
his variables of intermit* Furthermore, on the basis of time sequences 
and theoretical assumptions, he was willing to postulate a causal order- 
ing the variables* Specifically, lie was Interested In explaining 

the personal income of heads of households from their personal character- 





52 - 



I 

i 

Istics of (l) raco^ (2) age^ (3) education^ (4) oocupation^ and (5) full- 
naaa-ragularity of anploynant* Sinca ha was not willing to conaidar 
aach of thasa variablaa aa axoganoua with no eauaal ralationa to or from 
tha othara^ tha 4lamantary multivariate path modal waa obvioualy not the 
appropriate modal* For Oxampla^ because of its priority in time (being 
datarminad at birth race was taken as an exogenous variable which gen- 
erally has had an asymmetrical causal affect on tha level of education^ 
status of occupation^ and employment- fullness-regularity of heads of 

households through institutionalised patterns of racial discrimination 

« 

througihout the society* On the other hand/ race was postulated to have 
only a symmetric non-causal relationship to age which/ in turn/ was taken 
as an exogenous variable having an asymmetrical effect/ first of all/ on 
education because of the differing levels of educational experience of 
each age cohort (generation)* Age was further postulated to have an 
asymmetric positive effect on occupational status of heads of housfliold 
throu^ institutionalised patterns of seniority/ tenure/ and promotion* 
Ihis reistionship was also expected because of the exclusion of heads 
of households 65 or more years old from the sample - an age at which the 
relationship would be expected to become negative for a number of rea- 
sons* Finally/ age was expected to have direct effects on employment- 
fullness-regularity and personal income of heads of households* Educa- 
tion of heads of households was taken as the first of the dependent 
variables* However/ in addition to its postulated dependency on race 
and age/ education waa itself taken to have direct effects on the occupa- 
tion/ employment- fullness- regularity and personal income of the heads 



ERIC 



from a consideration of Instltutlonalined patterns of hiring and employe 
ment in industrial societies. Again, for institutional reasons, the status 
of a head‘s occupation was taken to be dependent on his race, age, and 
education while, in turn, . occupation was postulated to have an asynnietri- 
cal effect on the fullness and regularity of employment and income of 
heeds. Finally, employment- fullness- regularity was posited as dependent 
on the other variables and as determining income. A path diagram for 
this complex pattern of relationships is given in Figure 6 with the num- 
erical values of the postulated path and correlation coefficients* 

ft 

Having mapped the 6 variables onto a path diagram representing 
the rpu^ notions of causation with which we began, it was quite simple 
to write the following zmcurslve system of regression equations as the 

path model t 

Zl - Pla^a 

22 - ^b^b 

Za - P32Z2 + P31Z1 + Pac^c 

Z4 " ^4325 ^4222 ^4i2i **■ ^4d2d 

Zg - ^5424 + ^5323 + ^6222 ^ 522 i ■** ^Se^e 

Zg • ^65^5 ^6424 + ^6323 + ^6222 + ^6l2l + ^6f^f 

Furthermore, with the aid of both the path diagram and the path model, 
twenty-three specific predictions regarding direct and indirect effects 
of the variables were deduced. All but two null hypotheses regarding 
these propositions were rejected. The reader is referred to Eaton (pp* 105- 
Xl7) for complete details and evaluation. 




whtrtt 

H 


. 


Race of Hoad 


za 


- 


Ago of Head 


*3 


- 


Education of Head 


2* 


- 


Occupation of Head 


Zs 


- 


Employinont-Fullnes 8* Regularity of Hoad 


==6 


- 


Fereoiial Income of Head 









FIGURE 6* 



- 56 - 



Tht results for multlvarlato path models regarding computation 
of residual path coefficients and Indirect effects hold also for this 
type of model# Hoirever^ one must calculate a residual path coefficient 
for each equation In the path model by using the multiple correlation 
coefficient for that equation# As before^ the correlation between any 
two variables of the model nay be expanded along the lines of formula 
(1*20)« Let us explore the correlation of Zg and Z5 as an examplet 



rs5 - (l/H) SZ5Z5/N 

« (1/n) Z Zj (P 64Z4 + + ^62*2 ’^61*1 ''' 

* ^54*84 +’^58'^ '^6^28 * 'si'W /. gx 



The general form of this expansion theorem for multi-stage, multivar- 
iate path models Is 



’^Ij ■ 5 ^tk’^Jk 



(3.3) 



where 1 and J denote two variables in the system and the index k runs 
over all variables from which paths lead directly to If we continue 
to expand (3.2) by means of (3»5)> we have 




'58 


+ 


'54'S4 


+ 


'52'25 + 


'68 


+ 


'64*34 


+ 


's^2S 


'55 


+ 


'64*34 


+ 


'52*25 


'65 


+ 


'54’^54 


+ 


P52 (l/H S 


'55 


+ 


'54*54 




252 1/H 1“ ' 



+ P, 



(l/H Z Z1Z3) 

P5I 1 /N Z Z^ (P32Z2 + 

'81^1 + ^Sc^c) 

'^51^8^12 

+ 'sA2'i2 + ’^Sl’^81 

!g (232*2 + Ifsi*! + '^8c*e^ 

51*82'X2 



o 



ERIC 















■ ^5S ^54^34 ^58^32 Wj 51^21 ■*■ ®5l'32'l2 

+ ^51^31 

“ P55 + ^54 ^ *3^4^ '*' *52^52 ’’’ ^52^31^21 '*’ 

+ ®S1^32’^12 ■•■ ^51^31 

“ ^55 + ^64 1/" ^ H <W3 + ^2 + '41^ + 

+ W32 ^31^81 * ’^61^38 'i 2 ^31 

- '55 + W43 ^42*23 + ^64*41 'i 3 W32 

« 

* ^5^31*21 ■'■ ®5l^S2'l2 ■'■ *61^31 

" '53 •*• ^43 ■*■ '54^48 ^ V3^ + ^54^41 ^ 

+ W 32 + ^52^31^21 + WS2?^12 + hlhl 

■ ^53 "e 4'43 + W42 "2 ('32=^2 + Vl * Ve> 

+ ^54^41 ^ *1 ^^32^8 ®3l^l '*’ ^3c*c^ '*’ ^52^32 

+ ^S2^31^2X ®51^32’^12 * ^51^31 

- "53 ^ V43 + V42^32 + "54^31*12 + "54V32'l8 

+ ’^54^31 ■*■ ^52^32 ■•■ ^52^31^21 ■*■ ^5l’’3^l2 ■•■ ^51^31 

(5.4) 

This typ« of expansion can be carried out for all of the correlations be* 
tween any two variables In the model. It may yield valuable Information 



HllililMHiliHH 



fMHII 









V. ...-stnraar.uMtlJei^ 



•S7« 



rtgardlng indirect effects. If we subtract from both sides of (5.4)^ 
then we will have the familiar formula for the computation of the total 
indirect effect of variable 5 on variable 5* Furthermore^ in some em- 
pirical problems we may want to examine each of the separate indirect 
effects as on the ri|^t»hand side of (S.4). An unusual finding regard- 
ing the relationship of status of occupation (z^) to personal incoxm of 
of head (Zg) in the present exanqple was that its direct effect was ex- 
tremely small (0.020) when employment- fullness-regularity was controlled* 
However^ the ixvlirect effect of occupation on income throu^ its direct 
affact on aog>loyment-fullnass-regularity (Zg) and the direct effect of 
(Zg) on income, l*e«, the product (P54) (P$s)> was -K)«2S8. We must 
emphasise to the point of becoming polemical that it is throu^ such 
detailed findings as this that we will move behavioral research and 
theory beyond the quagmire of simplistic bivariate propositions to the 

realm of multivariate knowledge. 

At this point, we wish to reiterate that the model of Figure 6 
is only one example of a possible infinity of specific multi-stage, mul- 
tivariate path models. However, it suffices to illustrate the general 
principles to follow when dealing with a complex pattern of dependent 
relationships. . From his experience with such problems, the author sug- 
gests the following a taps as a general procedure. Firsts since the re- 
searcher often approaches multi-stage, multivariate problems with only 
crude ideas of the proper causal structure to postulate, he should begin 
to formalise his notions by mapping them onto a path diagram which he 
may use as a heuristic device until he is satisfied that it represents 



38 - 



th6 causal sequancas as suggastad by tha currant stata of thaoratlcal 
and amplrlcal knowladga about tha variablas of Intarast* In ganaral^ 
tha multi- staga^ multlvarlata path modal may Includa any numbar of ax- 
oganous variablas and any numbar of causal stagas with any numbar of 
dapandant variablas at aach staga. Furtharmora, as amphaslaad In tha 
discussion of rccursiva systems^ path coefficients from all pracadlng 
variables In the modal need not ba postulated for subsequent variables 
- note that In Figure 6 all path coefficients ware postulated - If 
there Is soma thaoratlcal or amplrlcal reason for postulating that they 
will be saro or naar-saro. Of course, If one or more path coefficients 
Is predicted to ba aero, than tha researcher should run tha modal both 
with and without those path coefficients to ascertain whether or not 
they actually disappear In tha amplrlcal data. Second, if tha research- 
er Is satisfied with the structure represented by his path diagram, 
then he should write the path model or set of recursive aquations which 
Is Implied by tha diagram. This sat of equations constitutes the ac- 
tual regressions from which he gats estimates of tha postulated path 
and correlation coefficients. Third s he should compute the residual 
path coefficients by applying formula (1.23) to each of the equations 
In the path model. Fourth , the researcher should compute estimates of 
total Indirect affects of prior variables on subsequent variables from 
formula (l.2l). Fifth . If he Is Interested In probing In detail the 
manner In which a prior variable effects a subsequent variable, then 
he Is encouraged to expand the correlation between the two variables 
along the lines of formula (3.3) and as Illustrated by equation (3.4). 



o 



59 - 



Thts five-fold procedure should facilitate the representation and inter- 
pretation of the most complex sequences of causal dependency. 

3>2« The lfailti-stage« Bivariate Path Model . If we postulate a 
aeries of causal stages but restrict the number of variables at each stage 
to two measured variables and a residual, then we have a special case of 
the multi-stage, multivariate path model which we may refer to as 
multi -stage, bivariate path model or s:«jnple ca usal chain. It is instruc- 
tive to examine this particular model because of the opportunity it pro- 
vides to test the basic assumption of recursive systems of regression 
equations, vl*., that Ahe residual terms of the equations are uncorrelated. 

For purposes of illustrating this model, we shall use the comr 
putations and data reported by Duncan (pp. 10-12). He postulated a mul- 
ti-stage, blwarlate path model to account for recently reported corre- 
lations between the occupational prestige ratings of four studies com- 
pleted at widely separated datest Counts (1925),^Smlth (1940), National 
Opinion Research Center (1947), and NORC replication (1963). The path 
model postulated by Duncan is given by the recursive systemt 



h ■ 


^a^a 






Zg 


^21^1 


+ 


^2b^b 


Z3 ■ 


^32^2 


+ 


^3c^c 


z • 


P. z 


*»• 


P z 


4 


43 3 




4d d 



(3. 5) 



where ■ Counts prestige ratings, 1925, Z^ • Smith prestige ratings, 
1940, Zj ■ NORC prestige ratings, 1947, « NORC prestige ratings, 1963> 

and Z^, Z^, Z^, Z^ are residual variables. A path diagram of the postu- 



er|c 



- 40 - 



lated model with numerical values for the path coefficients Is given 
In Figure 7(a). 

Table 3 gives the estimators of the path coefficients for the 
simple causal chain of Figure 7(a). As was shown In section 1.3 and 
illustrated In formulas (l), (2), and (3) of Table 3, the estimators 
for path coefficients in the bivariate case are correlation coefficients. 
Furthermore, equations (4), (5), and (6) of Table 3 show that the com- 
putation of the residual path coefficient Is an Imnedlate result of the 
formula for complete determination of the dependent variable. However, 
up until now we have not questioned the assumption of recursive sys- 
tems that the residuals are uncorrelated. Formulas (7), (8), and (9) 
Illustrate a condition which must be met by simple causal chains If 
that assumption Is tenable. That Is, If the assumption of uncorrelated 
residuals holds, then the observed correlation coefficient of the al- 
ternate or terminal variables of Figure 7(a) must equal the product of 
the observed path coefficients connecting them. Duncan (p. 11) gives 
the following calculations for Figure 7(a) t 



Variables 


Calculated 

Correlation 


Observed 

Correlation 


Difference 


^^31 


.951 


.955 


.004 


^^42 


.972 


.971 


-.001 


^^41 


.942 


.934 


-.008 



Althou^ the discrepancies between Inferred and observed corre- 
lations are small and trivial enou^ so that we may accept the hypothesis 
of a multi-stage, bivarlate^path model with vtncorrelated residuals, Duncan, 




41 




FIGURE 7. 






(1) r2i - ^21 

(2) rj2 “ P 32 

(5) '43 " 

(4) 122 “ *• “ 

(5) *35 • ^ ■ 

(6) ^44 * ^ ■ 

( 7 ) rji - P21232 

( 8 ) ^42 “ '^ 32^43 

(8) r^i “ ^21^32^43 




% 




to Illustrate what should be done In case the discrepancies had been large, 
constructed the alternative model shown In Figure 7(b). In this model, 
Duncan has dropped the assuii?>tlon of uncorrelated residuals and computed 
the correlations among them which must be postulated to explain the small 
discrepancies between the Inferred and observed path coefficients dis- 
cussed above. However, the assumption that a residual Is uncorrelated 
with the Imnedlately preceding variable In the chain holds for 7(b). The 
formulas provided by Duncan (p. 12) which yield the desired coefficients 

when solved In order are given In Table 4. 

In general, as Duncan points out. If we are considering a 

k-varlable causal chain, we must estimate k-l residual paths (5 for 
Figure 7(b)), (k-l) (k-2)/2 correlations between residuals (5 for Figure 
7(b)), k-l paths for links In the chain (3 for Figure 7(b)), and k-2 



ERIC 



- 43 - 



table 4t ESTIKATORS OF PATH COEFFICIENTS FOR TRE PATH MODEL OF 

FIGURE 



(1) 


*^21 


- 


.968 




(2) 


’'32 


m 


.982 




(s) 




- 


.990 




(*) 


^22 


- 


1 « 


P|l + 


(5) 




m 


1 - 


^2 ^ic 


(6) 




- 


1 - 


^ + ^Sd 


(7) 


*o 

u 


- 


.955 


" ^21^32 ^30*^01 


(8) 


^4.1 


- 


.934 


* ^413^21 ^43^3c'cl **■ ^4d*^ld 


(9) 


^42 


- 


.971 


■ ^43^32 ^21^4d’'ld ^2b^4d’^bd 


(10) 


'c2 


- 


0 - 


Wbd ■** ^21^^01 


(11) 


’^dS 


- 


0 - 


^3c'cd ^32’'2d 




(where 


T2d - 


^2l^ld ^2b’'bd) 



correlations between the Initial variable and residuals a^b,.».>j In the 
chain (2 for Figure 7(b)). This yields a total of (k^ + 3k - 6)/2 

quantities to be estimated. For. the purpose of estimation^ we may con- 
struct k(k-l)/2 equations expressing known correlations In terms of paths 
(aa in equaclons (l), (2), (S), (7), (8), and (9) of Table 4), k-1 equa- 
tions of cooplete deteimlnatlon (as In equations (4)^ (S)> and (6) of 
Table 4) and k-2 equations In which the correlation of a residual with 
the immediately preceding variable in the chain la set equal to aero 
(equations (lO) and (ll) of Table 4). This gives (k^ + 3k - 6)/2 

equations^ the precise number needed for a unique solution. 



Th€ procedure for testing the as8uiq>tlon of uncorreleted reel* 
duale^ ee Illustrated by Duncan's example^ may be useful In exploring 
relationships among variables which have been traditionally assumed to 
form simple causal chains* If, after the uncorrelated residual assump- 
tion is abandoned, the empirical data are still not sufficiently accounted 

^ *1 

for> then the assusption of a simple causal chain should be relaxed. Dun- 
can's comments (p» 12) regarding his example are particularly appropri* 
ate heret 



•». The solution may^ of course include meaningless 
results (e.g.^ r > l.O)^ or results that strain 
one's credulity* In this event; the chain hypo- 
thesis had best be abandoned or the estimated patdis 
modified* 

In the present illustration; the results are 
plausible enough* Both the Counts and the Smith 
studies differed from the two 1X)P«C studies and from 
each other in their techniques of rating and sam- 
pling* A further complication is that the studies 
used different lists of occupations; and the ob- 
served correlations are based on differing num- 
bers of occupations. There is ample opportunity; 
therefore; for correlations of errors to turn up 
in a variety of patterns; even though the chain 
hypothesis may be basically sound* We should 
observe; too> that the residual factors here in* 
elude not only extrinsic disturbances but also 
real though temporary fluctuations in prestige; 
if there be such. 

What should one say; substantively; on the 
basis of such an analysis of the prestige rat- 
ings? Certainly; the temporal ordering of the 
variables is unambiguous* But whether one wants 
to assert that an aspect of social structure 
(prestige hierarchy) at one date 'causes' its 
counterpart at a later date is perhaps question- 
able* The data suggest there is a high order of 
persistence over time; coupled with a detectable; 
if rather glacial; drift in the structure. The 
calculation of numerical values for the model 
hardly resolves the question of ultimate 'reasons' 
for either the pattern of persistence or the tem- 
po of change* These are; instead; questions raised 
by the model in a clear way for further discussion 
and; perhaps; investigation* 




45 - 



3.3> The Path Decompoaltlon Model . Because many of the depen- 
dent varlablee of interest in the behavioral sciences are composite var* 
lables> we now discuss a use of path analysis which may be referred to as 
the patii decomposition model s Thus> various tests or scales coomonly used 
In behavioral research are composed of subscales or subtests* In the 
case of such composite dependent variables^ it Is often of Interest (l) 
to compute the relative contributions of the component variables to varia- 
tion In the composite variable^ and (2) to ascertain how Independent var- 
iables effect the composite variable via Its components. 

We shall again utilise an Illustrative example provided by 
Duncan (pp. 7«>10)* However^ since the subject matter of his example Is 
rather specific (population density); we shall discuss his model sys^ 
bollcally; the form of his composite variable has wide generality. The 
raw»*score definition of the composite variable Vq Is 

Vo - Vi • Vg • V3 

Let Xq ■ log Vq; * log X2 ■ log V2; and X3 * log Vy Then the 
composite variable is an additive combination of Xj^; X2; and X3t 

Xi + X2 + X3 

If each variable is expressed In standard- score form; we may write 
*0 “ ^ 01^1 ^ 02^2 ’** ^ 03^3 

as the path decomposition model where Zq; ...; Z3 are the variables In 
standard- score form and Pq^; ^05 coefficients Involved 




er|c 



- 46 - 



In tht determination of Zq by Z2> end Zy In the case of complete 
determination by measured variables^ the definition of the path coeffi< 
cient, as given by formula (1.4), reduces to a ratio of standard devia- 
tions since the partial regression coefficient part of the definition 
(cj^j) is unity. Hence, the following numerical values of the path co- 
efficients, as Duncan Indicates (p*8)^ computed without prior 

calculation of correlations: 



*01 ■ 


Si/Sq • 


.152 Sj , - •«! 


®i ■ 


.065 


*02 * 




.468 


*2 ■ 


.250 


*05 “ 


- 


. 

00 

ID 


■ 

lO 

CO 


.405 



The path diagram provided by Duncan (p. 9) for the present 
path model is given in Figure 8(a). Table 5 gives Duncan's correUtlon 
matrix for the present problem. The composition of the correlation 
of the convoslte variable with its component parts may now be written 

from formula (3.3). 



'01 ■ 


'oi + 


*02*12 + 


Wl5 


-.419 


'02 “ 


*0l'l2 


+ *02 + 


*05'25 “ 


.636 


'05 ■ 


*0l'l5 


+ *0^25 


+ *03 “ 


.923 



As Duncan points out (p. 8), this preliminary analysis gives 
a clear ordering of the three components in terms of relative li^portance, 
as indicated by the path coefficients, and shows that one of the com- 
ponents is actually negatively correlated with the composite variable, 
because of its negative correlations with the other two components. 



4T 



T^nT.it 5t CORBEIATION H4TRIX FOR LOGARITHMS OF VARIABUES IN DUNCAN'S 



PATH DECOMPOSITION EXAMPU 



Variable 


Z1 


ZP 


zs 


M 


ZK 


Zo 


-.419 


.636 


.925 


-.665 


k 

0 


h 




to 

• 

f 


-.515 


.296 


.099 


^2 






.505 


-.594 


-.466 


h 








1 

• 

01 


-.226 












.549 



Diincsn postulates a second path model to account for the rela* 
tionship of the composite variable to two Independent variables via Its 
conpiAnants* A path diagram for this model Is given In Figure 8(b). The 
path modal 1st 



Z. 


• 


Wd 












Zs 


■ 


's.*. 












Z 3 


- 


*84*4 


+ 


*35*8 


+ 


*3c*c 


(5.7) 


Z2 


m 


*24Zd 


+ 


*2^5 


+ 


*2b*b 




Zl 


- 


*UZ4 


+ 


*1^5 


+ 


*U*a 




Zo 


m 


*0lZl 


+ 


*02*2 


+ 


*03*3 






It 


should 


be noted that 


sero correlation Is not assumed 


In tbli 



sK)dal for the residuals Z^, Z|,, and The path coefficients - P 34 , P 35 , 
T^2V ^25^ ^14* ^15 ” standardlaed partial regression coefficients 

as for other multivariate path models. Furtdiermore^ the residual path 
coefficients are given by formula (1.23). Duncan's djlscusslon of the 




- 48 - 




(«) 




FIGDIS 8. 



• 48 * 






coq>utatlon of the re.lduals (pp. 9-10) Is appropriate (using the sjna- 



bols of this paper) t 

The two Independent variables by no means ac- 
count for all the variation In any of the conpM- 
entSi as may be seen from the slae of the res 
uals! Pi . P«» and P«e>.*> lb Is possible, ne- 
wrtheless, ^r the InSependent variables to account 
for the Intercorrelations of the conq>onents> ana^ 

IdeaUyi one would like to discover independent 
variables which would do just that. The relevant 
calculations concern the correlations between re- 
siduals. These are obtained from the basic theor- 
em^ equation [3.3], by writing, for example, 

rg3 - ^24^^34 ^25^35 + ^2b^3c'bc> 

which may be solved for r^j^ * *014. In this 
the correlations between residuals are Mrely the 
conventional second-order partial correlations! 

utility in path analysis, turn out to 
priate when the question at issue is whether a s 
of independent variables 'explains the correla- 
tion between two dependent variables. In the pre- 
sent example, while r23 • ‘SOS, we JJc * 
ro« - *014. Thus the correlation between ... 

Zo aS... Zs Is satisfactorily explained by the 
respective relationships of these two components 
to Z. and ... Zg. The same is not true of the 
correlations involving ... Zi, but fortunately 
this is by far the least Important component.... 

Finally, Duncan gives the following equations as the most com- 
pact .MW.t to the question of how the effects of the Independent vetl- 
ebles ate transmlttad to the dependent variable via Its components (p.lO)t 






%'U ^02^24 

,0S9 - .278 - .424 - -.663 



( 5 . 8 ) 



and 



^05 



+ '02’'25 + ^0F55 

.013 - .218 - .185 ■ -.391 



(3.9) 



From thoso results^ w« note that the coiqpoaite variable la negatively 
related to both Independent variables^ but the effects via the first 
component, althou^ small, are positive. Furthermore, the relative 
ijqtortance of the effects of Z4 and Z5 via the second and third com- 
ponents la reversed In the two equations. Finally, a more detailed 
examination of the transmission of effects can be obtained by expan- 
sion of (5.8) and (3.9) by formula (5.3). 

The Basic Ass u "ptiQne of Pat h Analysis and Mbdel Tes^» 

At this point. It Is appropriate to discuss the basic assump- 
tions of path analysis, answer possible objections to the method, and 
mention extensions of the procedure which we have not developed In 

this paper. 

The Assumptl ^"** T.-ti.nArltv and Additivity. The assumption 
of linear, additive relationships among the variables of Interest Is 
made In most applications of correlation and regression analysis. Past 
empirical research in a particular problem area Is a good basis for 
Judging the tenablllty of the linearity and additivity assumptions. 
Otherwise, there are simple procedures such as point-plotting to explore 
the degree to which the assumptions hold In a specific set of data. Fur- 
thermore, althou^ a relationship may not be linear throughout Its en^ 
tire range of values, it m-y be linear within the range of values under 
consideration. Finally, there are a number of convenient transformations, 
such as the logarithmic transformation of section 3.3, which may be used 
to transform data In order to meet the linearity and additivity assump- 
tions. The Important point to enphaslae Is that these are assumptions 
which are specific to the particular area of Investigation and must be 



- 52 - 



‘4 




not ordinari ly willing to assume that a change in a variable caused a 
prior change in another variable. A second source of information for the 
causal ordering of the variables may be existing experimental or case- 
study results. Flnally> the theoretical assumptions of the particular 
substantive area provide a third source forthe asynmetry assumption. 
*^cllowing Wright (quotation in section l), we view a causal assunq>tion 
ai less of an assertion about empirical reality than as a strategy for 
Ini/siry. Path analysis^ by itself^ cannot prove the validity of a set 
of i.ui'sal assumptions. It can only give the consequences oC an assumed 
causi I sequence for a set of data. It Is^ of course^ the responsibility 
of the ^researcher to defend his causal assumptions in a given empirical 
study. 

A particular type of behavioral science relationship for which 
the asynn^etric path model may not be appropriate is the so-called "in- 
terdepend mt" relationship. That is, we often er r variables such 

that a ch < nge in one cause 3 a change in another which^ in turn, leads 
to a change in the former, etc. The relationship of level of education 
anH racial discrimination provides a common sociological example. At an 
original time of observation, some element in the social system may cause 
an increase in the level of Negro education. This increase in level of 
education may cause a decrease in racial discrimination. Furthermore, the 
decrease in racial discrimination may feedback throug^i a number of mechan- 
isms to cause another Increase in the opportunity for, and level of,Nbgro 
education. Thus, the cycle may continue until an equilibrium position 
of the variables is established. How can we handle this type of relation- 









WP) 



- 53 - 



ship vlth asymnatrlc path mode 1st The answer would seem to depend on the 
rapidity of the feedback relationship* In the case of a relatively slow 
feedback process^ our asymnetrlc path models will give us a crude^ but 
perhaps meaningful^ cross- section snapshot of the system which only 
approximates the “real" causal structure. Furthermore^ the path model 
vlXX have to be continuously updated over time as the regression rela- 
tionships change* For those systems of relationships in which the feed- 
back Is relatively rapld> however^ our asyimietrlc path models must be 
modified. Wrlg^it (1960b) has indicated how this may be done lii certain 
specific types of problems. Economists also^ with the problem of the 
rapid adjustment of market forces^ have dealt at considerable length with 
this type of siodel* The present author Intends o add a spedlal section 
on feedback models to this paper In the near future. 

The Testing of Hodels * Because of our emphasis on the repre- 
sentational and interpretive uses of path models^ we have not addressed 
ourselves to the problems of tests of significance for path coefficients 
and testing procedures for alternative models In this paper. These topics 
also demand treatment In a special section. As a rule of thumbs however^ 
we propose^ particularly for small samples^ that a criterion of at least 
0*10 be set for the retention of a path coefficient In a model. This 

that the variable must account for at least 10 per cent of the stan- 
dard devlatlcn^ and 1 per cent of the variance^ of the endogenous vari- 
able. A criterion of this level should not lead to the premature rejec- 
tion of too Bumy Important variables. 






ADDITIONAL READING 

The topics discussed in this paper are referenced in a wide 
variety of publications in biometrics^ econometrics, and statistics* 

If the reader desires to read their treatment in other sources, he may 
begin with the publications listed in the bibliography* Any of the 
articles by Wri|^t are worthwhile discussions of patii analysis. How- 
ever, his June I960 Biometrics article is a particularly good suMary 
treatamnt* The articles by Duncan, Keaq^thome, Tukey, and Turner and 
Stevens also provide good treatments of pr\th analysis from somewhat 
dlffersnt perspectives* 

Recursive systems of equations are discussed on an elemen- 
tary level in the book by Blalock. However, a somewhat more technical 
treatment is given in ^ Simon and Hold and Jureen references* 



H4I.I NitKii' iJJ II m 



•Al* 



APPENDIX I 

IBE MAIHEMA.T1CS OF BIVARIATE OORREIATIOH AND REGRESSION 

In what follows, we have not sought to develop a distinct pop- 
ulation theory and a d:'.stinct sample theory of correlation and regres- 
sion* We deal primarily with sample theory and utilise population theory 
in a heuristic fashion* Consider the case in which we have observations 
on two continuous, Intervally-measured variables X and Y in raw-score forsu 
Ihen we may define the correlation coefficient as followet 

Definition A* 1* 1« Suppose that X and Y are continuous, inter* 
vally*measured variables* Then the statement that r is ^e correlation 
coefficient of the observed values of the X and Y variables means that 
r is a number and tiuit r is a measure of the degree of ^llnea^^coyariatlon 
of the observed values of X and Y such that 



r - Z (X*l^) (Y-My) - I. xy ^ 
N S 35 Sy N S 35 Sy 



(A1.1) 



where X and Y are the observed values of the variables in raw- score form , 
and Hy are the meanm of the raw scores for the X and Y variables, 
and Sy are the sta?idayd deviations of the observed values of X and Y, and 
X and y are the observed values of the X and Y variables in deviation- 

score form, 1 * e. , x ■ 7 * Y- 1 ^* 

Now we would like to develop a means for predicting Y values 
for given X values and for interpreting the correlation coefficient. We 
begin by ^ssining that the X and Y variables are linearly related, i*e*. 



we assuas that the functional form of the relationship of Y and X in the 






V ' s -f ■ 



n-^ ^1 



-A2* 



population of X and Y values Is a line of the fora Y - A" + B*’X# where 
A** is the Y Intercept (value of Y where the line crosses the Y axle) and 
B” is the slope of the line ( the inclination of the line to the X axis). 

At this point, then, our problem is merely to develop estima- 
tors of the A” and B” coefficients from the observed X and Y values. 

Of the several methods of estimation, we shall follow traditional prac- 
tice here and adopt as a basis for an estimated best- fit line the cri- 
tvsrion that the sum of the squares of the deviations from the line shall 
be as small as possible. In symbols, let Y* ■ A + BX, where Y (read 
Y-prime) is the value estimated, for a given X, of the Y variable, A is 
the estimated value for A”, B is the estimated value for B”, and let 
Y be the observed Yalue of the Y variable. Then (Y-Y*) represents the 
squared deviation of any Y from the estimated value. Our problem is to 
choose the estimators A and B so as to Lake Z(Y-Y*)^ as small as poss- 
ible. We shall find it more convenient to deal with both the equation, 
y» - a + bx, and the sum, X(y-y*)^ ^ deviationr units, with y» and y 
as deviations from My and x - X-M,^. This is merely a translation of 

reference axes to make the origin coincide with ^ and M^. Therefore, 
the value of a in the equation becomes sero and we shall drop it from 
furtiier consideration. 

This allows US to write y* « bx as the equation for estimst- 

Ing y, in deviation-units, from x, or deviation-values of x. Our prob- 

2 

lem now is that of detennining the value of b which will makex(y-y*) 
a m1 n<T"*F- It shall be simple, once the optimal value for b has been 
determined, to pass back to the original reference frame, the gross- 



o 




u 












-A3- 



8Cor€ az88; by 8ub8tltuting for y* the yaluet Y*"Hy/ end for z> 

The following derivation of the opt inal value for b more or 
leea followa that given by McNenar (pp. 122»S)» We begin by eetting 
up the function 






Z (v^bzV 



N 



N 



in which we have M deviatione of the form y»y' or ybz* The sum of 
theee equared deviatione divided by N givee ue the function which we 
want 'to minlmiae by the proper choice of b* We shall choose the pro* 
per value of b by utilising a theorem from the calculus. According to 
this theorem^ we may sdnimisethe above function by taking its deriva- 
tive with respect to b^ settix;g this derivative equal to lero, and then 
solving for b« Thus 






-2 Z x(t-Iwc) 
H 



whioh^ set equal to sero^ divided by -2; yields 



or 



> 



R 



then 



iJEL 



£JL 



M 



• 0 



o 

ERIC 









H 



mmmm 






-A4- 



lh« first tsrm ixivolvss the correlation coefficient as defined by for- 
mula (aUI)/ from which definitional formula we aee that £xy/N ■ 
and since £x^/n * we have 

rSjjSy - bS^ - 0 

or 

rSy - bSjj - 0 

which gives 




as the proper value for b» We therefcre have 

y* » r X (A1»2) 

Sx 

as the equation for the best- fit line in deviatiox^score fomu By proper 
substitution^ we have 

Y'-l^ - (X-l^) 

Sx 

or 

Y' - r ^ X + (1^ - r Sy Mj) (Xl.s) 

Sx 

as the equation in terms of the original or raw- scores* It is this form 
which we would use in predicting Y from X» Note that B • b « r(Sy/Sj^) is 
the slope of the line and that the constant A is the term in parentheses* 
Furthermore^ note that we can get another form of equation (A1*2) by dividing 
both sides of the expression by Sy4 



•V-t.# '> >*s ^4ryft"nyiw ^ ''• 






'■'A5- 



Sy S, 

Th« obsex-vant raa<2er will recognlae the y*/^y */®x ^ 

and (predicted) Y variables In standard-score form, or 




or 




B*Z, 



(Al.i) 



(A1.5) 



where 5^^ » r. Is usually referred to as the standardlaed regression 
coefficient or beta weight . It Is usually symbolized by the Greek letter 
Beta, but, since we do not have a Greek letter for Beta on our keyboard, 
we shall denote It by B*. Thus, we have three formulas for the line of 
beat fit - (A1.2), (AI.3), and (Al.5) - corresponding to deviation-, ob- 



served-, and standard— score units, respectively. 

We note two additional relations. Flrpt, we derive a formula 
f or Byx ■ b - the slope of the least-squares equation estimating the 
regression of Y on X - In terms of delvatlon scores. From (AI.3), we 



have 

Byx ■ r ^ 

Sx 

- Z xv/n ^ 

• g xv/n 

^2 



(A1.6) 



Second, just as we can estimate Y values from X values we can extend 
our equations to the estimation of X values from Y valuet, although we 
may never desire to dc this in practice. Then we may write the equa- 
tion for estimating X from Y as follows 

X* “ ^y ®xy ^ 

where - r ^ My and - S xy/l? - r S, . Then we 

s" ^v 

y y 

may write the product 

VV • 

X y 

At this point, it is profitable to review three properties or 

interpretations of the correlation coefficient. 

A. 1.1. Rate of Change . It is obvious from equation (AI.4) that 

the correlation coefficient may be interpreted as a rate of cha^ - the 
amount of change d.n variable Y per unit of change in variable X - in 
flrfln fiftrd»«8core £orm ^ It may also bn shown that the correlation coeffi- 
cient has bounds of +1 and -1. Hence, the largest possible change in Y, 
given a standard deviation or change in X, is plus or minus one standard 
deviation. It should also be noted from equations (Al.2) and (A1.5) that, 
if the standard deviations of the X and Y variables are equal, then the 
correlation coefficient may be interpreted as a rate of change for the 
variables in deviation- or observed- score units. 

A. 1. 2. Accuracy of Prediction . The next property or inter- 
pretation of the correlation coefficient concerns the accuracy of pre- 



-A7*" 



diction b) 5 »ean» of the regression equation* Umt is; we would like to be 
able to set up confidence bands or intervals about the regression line 
or predicted y-values in which we can have confidence that most of the ac- 
tual observations fall with a specified degree of probability* In order 
to accomplish this goal, we need a measure of dispersion for the regres- 
sion line comparable to, say, the standard error of estimate of the mean 
of a distribution. 3y introducing an assumption we may derive such a 
^gure. We must assume that the standard deviations of the distribution 
of Y values for each value of X are equal in the population from jhich 
the sample is drawn. This assumption of homoscedasticity implies that, 
if we had a much larger sample sise, the standard deviations of Y-values 

for each X value would be very nearly equal. 

Note that y^y^ (or Y-Y*) represents tlie discrepancy between 

estimated and observed values and that X(y-yO^/N i» the mean of the 
squared deviations, the root of which will be thp standard deviation of 
the discrepancies between estimated and observed values. This particu- 
lar standard deviation shall be called the standard error of estima,^ 
ef the regression of Y on X and shall be denoted by ^e may derive 

an algebraic form for this expression as follows. By definition, 

4.x ■ ^ 



but 



r Sjj X 



from (a1.2), so 



















-A8- 




l 

N 



Z (y - r x) 

Sx 



1 . 

N 



Z (y^ » iir Sy xy + Sy x^) 




then 



S 



S. 



7 




(Al*8) 



H€iictt> wtt hav6 a second procedure for interpreting the correla- 



coefflclentf lu terns of the acciuracy of prediction or closeness 
of fit of the regression line to the data* If no correlation exists^ we 
see that error of estinate is Sy* We should also note the tem in 



tl.s coeffir lent of alienation * Observe that if r - 0, its value is 1 and 
the error of estimate is Sy* 



A* 1*5* Variance and Correlation* It nay be shown (see McNenar^ 



129) tiiat the variance of a sun (or difference) of two independent 
variables is equal to tiie sum of their separate variances* Furthemore^ 
it nay he* shc?^^ v*®® Mditenar> p. 130) that the predicted y* and the re- 
sidnal (y-y*) are independeriU Therefore, we have y * y* + (y-y*) end, 
since the two parts are Independent, 



(A1*8) which involves r li 





ERIC 



(AU9) 



r 






•A9“ 



vh«r« la the variance of the realduala^ (yy*)* Dividing both aldea 

y*x 



of this oquaelon by S& we get 



■ i?l 

4 



iii 



(Al. 10) 



from which we see that^ since the two ratios add to unity either one can 

be interpreted as a proper clone In shorty the ratio of Syt to Sp i* the 

proportion of the vaifiance in Y which can be predicted fron and the 

ratio of to represents the proportion of the variance of Y which 

y*x y 

is left over or renains or cannot be predicted from X» This is the same 
variance which results if ve square fomula (Al»8)t 



or 



'y.r 



8 



y.x 



s; (l-r2) 



l-r2 



Hence^ we may substitute this value in (A1*10) 



'■41^ 



1 - r2 



from which we have the ratio 



4 ■ 

•7 



(A1.11) 



In short> the square of the correlation coefficient gives the proportion of 
the total variance of Y which is attributable to variance in X* Alsoy the 

i 

proportion of variance in Y which is due to variables other than X is 1 - r' 
Ihis is e third possible interpretation of r. 



o 

ERIC 









AlO- 



APPENDD. II 

THE M/ITHEK^TICS OF KULTIVARIATE COSBEIATION AKD RESRESSION 



We shall now extend ouz discussion to the multivariate case in 
which we attempt to predict one variable by using several other varia- 
bles and to analyze its variance into component parts. We shall find 
soma sisiilaritles to the bivariace case> but we shall also encounter 



soma significant differences. Again^ parts of our derivations more or 
less follow those of HcNemar (pp. 169- 178 )• 

A>2.1. The Three- Variable Case. Consider first the problem 
of predicting (criterion) from a knowledge of and X^ (predlxtors). 
Geometrically^ we can imagine this as involving three reference axes 
instead of two as in the bivariate case. Here we can think of the vert- 
ical axis as representing X^ and the two horizontal axes as represent- 
ing X 2 and X 3 * We begin by assuming that the relationship of Xx to X£ 
and X 5 is linear^ i»e»^ we assume that the functional form of the re- 
lationship of the variables in the population of Xi, X 2 , and X^ values 
is of the form Xx - A” + Bg Xg + Bg Xj in which A” is the (pop- 
ulation) value of Xx where the plane representing the predicted values 
of Xx cuts the Xx axis (where X 2 and are sero)^ and Bg and are 
the inclinations (population values) of the plane of predicted Xj^ values 
to the X 2 and X 3 axes, respectively (the expected changes in X^, given 
a unit of change in X2 and X3, respectively). 

As in the bivariate case, oiir problem is to estimate A", Bg# 
Again, this is a least- squares affair - the sum of the squares 



of the errors of estimate shall be a minimuiiu In short, we desire values 



-All- 



for A> and B2 in the equation 

XJ - A + + B5X3 

er> equivalently^ the values and b« In the deviation-unit equation 

X* - bg]t2 + Vs 

such that the sum 

Z (Xl-X{)® - z (xi-x{)^ 



The task of derivation ia aomewhat ainplified if we tranafoxn 
all turee aeto of values into standard-acore torn., ioO*, if we aet 
Zf « (X^-M£)/S£* Then our equation becomea 

Z » - B| Zg + Zj (A2. 1) 

where Bj represents the partial regression coefficient in etandardp»acore 
fora a Aa for the bivariate case^ these regression weights are usually 
called beta coefficients or standardised regression coefficients and de- 
noted by the Greek letter. Beta. Since we are changing the size of our 
unit of measure, it should be noted that, say 1^ will not necessarily 
equal B2 * bg* Nsw we need to determine the value of ^ and such that 
the average of the squared errors, or 

1 Z (Zi - zl)^ 

N 

shall be a ninljanu Since Zj - Z| - ^1 " ®2 ^2 * ^ V *** ^«»ction 
to be minimised is 








. 5;.^ ■ ' K ¥ 9 m’ t w - m 9 j m >»J M UJt \ t ^m 



r nmwm 



-A12- 



f - _l^ 2 (Zi - Bg Z2 - ^ Zj )2 

N 

As In the bivariate case^ the calculus Is used to determine 

the values of and which will make this function a minimum. Taking 

2 3 

ti e partial derivative of the function first with respect to and then 
wltii respect to 



dpf - 


-2Z Z2 


(Zi - ^ Z2 - ^ Z3) 


dpl^ 


N 




dpf - 


-2Z Z5 


(Zl - I? Z2 - Zg) 




n' 





These two derivatives must be set equal to sero and then solved simul- 
taneously for the unknowns, ^ tad By performing the Indicated mul- 

tiplications, sumlng, and dividing each equation by 2, we get 



~ L Z^Z2 + 
N 

- ^ + 
N 



B§ Z Zg + ^ Z ZgZj 

T N 

B* Z Z2Z5 + ^ Z Zj 

~ N 



0 

0 



Noting that the sum of squares of standard-scores Is unity, whereas any 
sum of cross products of standard- scores divided by N la the correlation 
between the two variables Involved In the cross products, we have by 
application to the above equations 



-ri2 + ^ ^25 

-^13 ^ '25 



ERIC 



or 



«• ' - r ri ■/ » w«r jTr r ff <ir~**- " 









■« * vrc'W* «r*»wc»«WfW*w* 



-AX5- 



^ ^23^ * 'l2 



0 



^ 25 ^ + ^ - '13 - ® 

Since the rs In the above equations are determinable for any specific 
set of data, we may treat them as knowns leaving only the B*s as unknowns. 
Solution of these two slaailtaneous equations in two unknowns gives 

^ " '12 " '13'23 



^ * '13'23 

1 - '§3 

As soon as the values of and ^ have been determined, they 
can be substituted in the prediction equation 

7 1 - B* z + b; Z 

2 2 3 3 

so that for a given pair of Zg and Zj values we can predict the standard- 
score on the criterion variable. However, It Is often more convenient to 
deal with deviation- or raw>-scores. Hence, by replacing the Zs In the pre 
ceding equations by their values In terns of raw^scores, neans, and stan^ 
dard deviations, we will have 




(A2.2) 



Xi-Mi 

Si 






or 



X- - ^ 

^ Sg Sg Sg Sg 



After multiplying by and rearranging t#rms, we have 



-Ali- 



- B|^X2 + HhH * ’ffl V (A2.3) 



s, 



as tha ragrasslon e<|uatlon In raw^scora fozni* From this aquation^ wa saa 
that our original B 2 miat equal (Sj^/S 2 )> (S^/S^)^ and A " the 

paranthases tamu Wa may also derive values for the partial regression 
valets In terms of bivariate regression waists from these aquations as 
follows 



®2 



1 »S 



- S] 

32 

• 81 ri2 - ^13x23 



S, 



'2 1 “ »|3 

and/ since ^ by formula (A1*7) 



* Si ri2 - rijras 



S 2 1 • ®23®52 



Si Z xixg/N - Si (Z X 1 X 3 /N • Z * 2 * 5 /*^) 

S2S3 ) 



S1S2 



$2 ^ SiSg 



1 - B23 B32 

Z X 1 X 2 /N • ^ * 1 * 3 /^ • 



Z X 2 K 5 /N 



s| 



c 2 

^2 



1 ®23®32 



or^ since B 12 



Z XIX 2 /N by formula (A1. )> this reduces to 
-g— 



»2 



®12 ■ (® 1 ?) (* 32 ^ 



(A2.4) 



1 ” B25 Bj2 






ERIC 






-A16- 



Thu»> ir« hav* an aquation for tha raw^ or davlatlon-acora partial ragraa- 
flion coafficianta in tanna of bivariata raw- or daviation-acora ragraa- 
aion coafficianta* Tha notation for this coafficiant ia oftan writtan 
B ^.3 to indicata that it ia tha regraaaion of variabla 1 on variabla 2 , 
controlling for variabla 3 (tha axpactad changa in variable 1 , given a 
unit of changa in variabla 2 , controlling for variable 3). Genaraliaing 
thii notation> we nay write 



1 - (>jk) (Bkj) 



(AS. 5) 



aa the equation for the partial regression coefficient in raw- or devia- 
tion-score form of variable i on variable J> controlling for variable k* 

We can ascertain tha accuracy of the prediction of from tha 
beat coBd>ination of Xg and X 3 by examining the error of prediction, i.e., 
X^ - X[ or •• Zp. The sum of the squares pf the errors divided by 

N will yield the variance of the errors. Tha square root of this variance 
would correspond to the standard error of estimate. Let 
standard error (in Z-unlts) for predicting X^ from Xg and Xj, l.ee, let 
S^l be the standard deviation of the residual terms (in a-unlts). Then 

N 

- E (Zj - ^ Zg - E| Zj)2 



N 



I 









-A16- 



Z Zi + B| S Zg + Bg Z Z3 



2 . 



2]^ S Z^Zg 



N 



N 



N 



N 






H 



N 



1 + + if - 23|rj2 



2B* r 



S IS 



SBg]^ 



2S 



(A2.6) 



which by algebraic manupulatlon reduces to 



^*1*23 * ^ ^12 ■*■ ®3 ^13) 



(A2-7) 



in tnriDft of standard acores. Of course^ s| tlxnes this would giva tha arror 
variance for raw^seoras* 

We procead to define tlie multiple correlation coafflclant as the 
correlation between Z|^ and the best estimate of Z^ from a knowledge of 
Z 2 and Z^* In symbols^ 



^1.23 



»«i*i 



EZi Z- 



“ S.1 S,j 



Z Zj Zg + If Zg^ 



(A2.8) 



N S,» 

*1 



l!l6ta that, for a sample of values • U However, It does not follow 



that • 1* In order to evaluate this last S, we may think of Z^ as 



being made up of two parts - that which we can estimate plus a residual t 



K **■ ^1.23 



It can be shown that these two parts are Independent of each other* Hence, 



their variances are additive t 



'll 



o2i 

Sal 



„2 

Sal. 23 



ERIC 



i 









A17- 



or 



S2. + 



S* 



xl.23 



thftn 



s^ 

■1 



1 •* si 



Kl»23 



2 

Hovuver^ ^kI» 25 nothii^s more than the variance of the prediction errors 
as given by (A2»7)| hence this becomes by substitution of (A2»7) 



®*1 * 1/^ *^12 * ^ ’^IS 

Then^ by substituting (a 2*9) In (a 2»8)^ we have 



(AiU9) 



^1.25 * S Zg + ^ Zj) 

^ ri2 + BJ ri3 

• Z Zj Zg + B* Z Zj, Z3 

K ^12 ^ ^ ’^IS 

“ Z^’^12 **■ (A2.10) 

^om formula (A2»10) and (A2*7)^ we see that we can write the 
standard error of estimate in raw^score form as 



®1.2S " 




(A2.U) 



This formula may be used to define the multiple correlation coefficient. 
The relationship Is 




By substituting from (A2»7)p we again have (a2«10). 



(AP.12) 



-Aia- 



At this pointy we may note the similarity of formula (A2.11) to 
the standard error of estimate for the bivariate situation* Thus^ the ^ 
interpretation of the correlation coefricient in terms of reduction In the 
error of estimate holds for the multiple correlation coefficient in ex- 
actly the same manner as for the ordinarry bivariate correlation coeffi- 
cient* Furthermore, the interpretation in terms of proportion of var- 
iance explained also holds for the multiple correlation coefficient. 
However, In the case of two predictor variables, we find some Interest- 
ing differences* Let us explore those peculiarities. 

We must answer the question as to the relative Importance of 
the two pi.’edlctor variables as contributors to variation in the criterion 
variable. Obviously, the B coefficients in the raw- score regression 
equation cannot be interpreted as Indicating the relative contribution 
of the two Independent varjnbles since the two B coefficients usually 
Involve different units of measurement. Therefore, a Bg twice as large 
as Bj does not lsq>ly that B 2 is twice as lnq>ortant as B^* However, the 
variables in standard- score form will be comparable and hence the beta 
coefficients in the standard-score form of the regression equation will 
be comparable* Since 



2 2 

S t + S, „ 
sx 1*23 



or 



s|l + 

*1 



’si. 33 



and 



1 - 



g2 

®sU23 



m p/ 



1*23 






-A19- 



It follows that 

4.SS ■ 

In vords^ 23 vaich corresponds to the proportion of variance explained 
by variables 2 and 3 is equal to or the variance of the predicted 
stand rd«scores» 

On the other hand> note that^ since 



Z« - B$ Zg + Z3 

2 

we can indicate the value of as 

4.25 ■ 4{ - " S (Bg Zg + B* 23 )^ 

N 

■ N ' ' 

which i^ 

In shorty the predicted variance^ which corresponds to the "explained" 
variance^ can be broken down into three additive cooiponents« Purther- 
wox ^9 we eee that the relative importance of the variables X 2 and X 3 in 
"explaining" variation in can be judged by the magnitude of the squares 
of the beta coefficients. The third term in formula (A2.15) represents 
a joint .'contribution which is a function <?f the amount of correlation 

between the two predicbing vax'^ables. 

^2.2. More Than Three««Variablee . The extension of multiple 
correlation and regression to Include any number of variables Involves 




$ 



thm CMi principle as for the three-variable case* le ocher verde 
the Interpretation of the regression and correlation ooeffl clent a la 
esjMi for n as for 3 variables^ and the ertenalon of tiie faranlaa 
ihoeli be obvloua* 




B/BLI0G8AFH7 



B1 



Blalock^ Httbart M* Cauaal Inferences in Wonaxpartaantal Eaaaarch . (Siapal 
Hlllt UnlTarslty of North Carolina Praas^ 1964. 

Duncan^ Otla Dodl^. Path Analysis t Sociological Ezaig>les. Autries 
Journal of So clolo iyf Vol. 72^ July 1966> pp. 1 • 16. 

Baton/ BaYld C. The Use of Causal Models In the Study of lawr Incona Statna. 

Unpublished Ph.D. dissertation/ The University of Tazaa/ Austin/ 
Tezas/ 1967* 

giggipdionie/ Oscar. jto_ 2 [ntroductlon_to_jGenetlc__^tatl 8 tlca. Hue Yorkt Jo^n 
hlley and SonS/ 1957/ chapter 14. 

McBtuar/ Quinn. Psychological Statistics . New York* John Wiley and SonS/ 
1962. 

Siaon/ Herbert A* Models of Man . New Yorkt John Wiley and SonS/ 1957/ 
chapters 1 and 2. 

Thkey/ J* W. Causation/ Segresslon and Path Analysis. In 0* Kaepthome 
et. la. (eds.)/ Statistics and Mathenatlcs In Biology . Aaesi 
Iowa State College Press/ 1954/ chapter 3. 

Xtamer/ Malcola E. and Charles D. Stevens. The Regression Analysis of 
Causal Paths. Bloee tries, Vol. 15/ June 1959, pp. 236-258. 

Heman and Lars Jureen. Desiand Analysis . New Yorkt John Wiley and 
Sons/ 1953. 

Wrlgbt/ SewSll. Correlation and Causation. Journal o f Agricultural 
Research . Vbl. 20/ 1921/ pp. 557-585. 

Wrl^t/ Sewell. The Method of Path Coefficients. Anmtlg of Matheuatlcal 
Statistics . Vol. 5/ Septeid>er 1934/ pp. 161-215. 

VriAt, Sewnll* The Interpretation of Multivariate Systens. In 0. 

Baaptfaome et .al. (eds. Statistics and Math enatlcs In 
Biology . AiMst Iowa State College Press/ 1954/ chapter 2» 

WrlAt/ Sewsll. Path Coefficients and Path Regressions t Alternative or 
Qoapllaentary Concepts? Bloiae tries , Vol. 16/ June 1960/ 
pp. 189-202. 

Wrl^t/ Sewell. The Treataent of Reciprocal Interaction/ with or without 
Lag/ In Path Analysis. Blotetrlcs , Vol. 16/ Septeaber 1960/ 
pp. 423-445. 
















FROM: 

ERIC FAOLmr 
SUITE 601 

1735 EYE STTvEET. N. W, 

WASHINGTON. D. C 2000® 











